Archer Data Scientist
Job in
Livermore, Alameda County, California, 94550, USA
Listed on 2026-06-02
Listing for:
Archer Technologies LLC
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Data Engineer
Job Description & How to Apply Below
With over 20 years in the risk management industry, the Archer customer base represents one of the largest pure risk management communities globally, with more than 1,200 customers including more than 50% of the Fortune 500. Learn more at
Data Scientist - LLM & Data Pipeline Engineering (Legal Tech / Reg Tech AI)
Overview:
We are seeking an experienced Data Scientist with a strong background in AI model integration, data pipeline development, and knowledge base (KB) engineering to support our next-generation Legal Tech / Reg Tech AI platform.
This role blends applied machine learning, data engineering, and software development, focusing on building scalable pipelines that connect large language models (LLMs) to structured and unstructured data through retrieval-augmented generation (RAG) and vector database architectures.
The ideal candidate is passionate about operationalizing AI - from training and fine-tuning models to deploying intelligent retrieval systems in AWS cloud environments.
Key Responsibilities
1. AI Model Integration & Development
* Design, train, and evaluate LLM-based pipelines for document understanding, obligation extraction, and regulatory reasoning.
* Implement and optimize RAG architectures, combining LLMs with vector databases for semantic retrieval.
* Develop and maintain model fine-tuning workflows, embedding generation, and knowledge distillation.
* Collaborate with ML Ops teams to integrate AI models into production-ready APIs and services on AWS.
* Measure and improve model precision, recall, latency, and interpretability.
1.5 Agentic and MCP Knowledge Integration:
* Design and maintain agentic multi-component processes (MCPs) that enable context-aware reasoning across multiple data sources and agents.
* Implement AI agents capable of dynamic tool use, autonomous task decomposition, and multi-context knowledge retrieval.
* Develop pipelines that support agent memory, self-reflection, and knowledge synthesis across distributed systems and knowledge bases.
* Collaborate with engineering teams to integrate MCP-driven agents with retrieval, analytics, and workflow orchestration layers, ensuring compliance with regulatory reasoning frameworks.
2. Data Pipeline Engineering
* Build and manage end-to-end data pipelines for ingestion, transformation, embedding, and indexing of legal and compliance data.
* Orchestrate data workflows leveraging AWS services (e.g., S3, Lambda, Glue, Sage Maker, Step Functions, RDS).
* Develop scalable ETL/ELT processes to feed both relational (Postgre
SQL) and vector databases (e.g., Pinecone, FAISS, Weaviate, Elastic Vector Search).
* Ensure data lineage, reproducibility, and version control across AI and analytics pipelines.
* Automate retraining and evaluation pipelines for continuous learning from user feedback.
3. Knowledge Base & Information Retrieval
* Architect and maintain intelligent Knowledge Bases (KBs) to support AI-driven search, summarization, and compliance reasoning.
* Implement advanced retrieval techniques using Elastic Search / Elastic Vector Search and embedding-based retrieval.
* Align KB structures with business ontologies and regulatory taxonomies to support explainable AI outputs.
* Collaborate with domain experts and PMs to enrich KB metadata and enhance model context relevance.
4. AWS & Deployment
* Deploy and scale AI pipelines using AWS services such as Sage Maker, Lambda, ECS/EKS, API Gateway, and Cloud Formation/Terraform.
* Implement model and data monitoring solutions for drift detection, latency management, and cost optimization.
* Collaborate with Dev Ops to maintain secure, reliable, and compliant cloud environments.
5. Cross-Functional Collaboration
* Partner with engineering, product, and compliance teams to align AI models with regulatory and data governance requirements.
* Work closely with QA and Professional Services teams to validate AI outputs and improve client-facing performance.
* Document architectures, experiment results, and data flows to ensure transparency and reproducibility.
Preferred Experience
* Experience building AI products for Legal Tech, Reg Tech, or compliance automation.
* Familiarity with agentic AI frameworks (e.g., OpenAI MCP, CrewAI, Lang Graph, or Auto Gen).
* Background in document intelligence systems, multi-agent orchestration, or knowledge graph integration.
* Experience with Lang Chain, Llama Index, or similar frameworks for RAG orchestration.
* Hands-on knowledge of MLOps tools and data versioning (DVC, MLflow, Weights & Biases).
* Understanding of…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×