Archer Data Scientist Job Livermore area,California USA,IT/Tech

Archer is a leading provider of integrated risk management (IRM) solutions that enable customers to improve strategic decision-making and operational resilience with a modern technology platform that supports qualitative and quantitative analysis driven by both business and IT impacts. As true pioneers in GRC software, Archer remains solely dedicated to helping customers manage risk and compliance domains, from traditional operational risk to emerging issues such as ESG.

With over 20 years in the risk management industry, the Archer customer base represents one of the largest pure risk management communities globally, with more than 1,200 customers including more than 50% of the Fortune 500. Learn more at

Data Scientist - LLM & Data Pipeline Engineering (Legal Tech / Reg Tech AI)

Overview:

We are seeking an experienced Data Scientist with a strong background in AI model integration, data pipeline development, and knowledge base (KB) engineering to support our next-generation Legal Tech / Reg Tech AI platform.

This role blends applied machine learning, data engineering, and software development, focusing on building scalable pipelines that connect large language models (LLMs) to structured and unstructured data through retrieval-augmented generation (RAG) and vector database architectures.

The ideal candidate is passionate about operationalizing AI - from training and fine-tuning models to deploying intelligent retrieval systems in AWS cloud environments.

Key Responsibilities

1. AI Model Integration & Development

* Design, train, and evaluate LLM-based pipelines for document understanding, obligation extraction, and regulatory reasoning.

* Implement and optimize RAG architectures, combining LLMs with vector databases for semantic retrieval.

* Develop and maintain model fine-tuning workflows, embedding generation, and knowledge distillation.

* Collaborate with ML Ops teams to integrate AI models into production-ready APIs and services on AWS.

* Measure and improve model precision, recall, latency, and interpretability.

1.5 Agentic and MCP Knowledge Integration:

* Design and maintain agentic multi-component processes (MCPs) that enable context-aware reasoning across multiple data sources and agents.

* Implement AI agents capable of dynamic tool use, autonomous task decomposition, and multi-context knowledge retrieval.

* Develop pipelines that support agent memory, self-reflection, and knowledge synthesis across distributed systems and knowledge bases.

* Collaborate with engineering teams to integrate MCP-driven agents with retrieval, analytics, and workflow orchestration layers, ensuring compliance with regulatory reasoning frameworks.

2. Data Pipeline Engineering

* Build and manage end-to-end data pipelines for ingestion, transformation, embedding, and indexing of legal and compliance data.

* Orchestrate data workflows leveraging AWS services (e.g., S3, Lambda, Glue, Sage Maker, Step Functions, RDS).

* Develop scalable ETL/ELT processes to feed both relational (Postgre

SQL) and vector databases (e.g., Pinecone, FAISS, Weaviate, Elastic Vector Search).

* Ensure data lineage, reproducibility, and version control across AI and analytics pipelines.

* Automate retraining and evaluation pipelines for continuous learning from user feedback.

3. Knowledge Base & Information Retrieval

* Architect and maintain intelligent Knowledge Bases (KBs) to support AI-driven search, summarization, and compliance reasoning.

* Implement advanced retrieval techniques using Elastic Search / Elastic Vector Search and embedding-based retrieval.

* Align KB structures with business ontologies and regulatory taxonomies to support explainable AI outputs.

* Collaborate with domain experts and PMs to enrich KB metadata and enhance model context relevance.

4. AWS & Deployment

* Deploy and scale AI pipelines using AWS services such as Sage Maker, Lambda, ECS/EKS, API Gateway, and Cloud Formation/Terraform.

* Implement model and data monitoring solutions for drift detection, latency management, and cost optimization.

* Collaborate with Dev Ops to maintain secure, reliable, and compliant cloud environments.

5. Cross-Functional Collaboration

* Partner with engineering, product, and compliance teams to align AI models with regulatory and data governance requirements.

* Work closely with QA and Professional Services teams to validate AI outputs and improve client-facing performance.

* Document architectures, experiment results, and data flows to ensure transparency and reproducibility.

Preferred Experience

* Experience building AI products for Legal Tech, Reg Tech, or compliance automation.

* Familiarity with agentic AI frameworks (e.g., OpenAI MCP, CrewAI, Lang Graph, or Auto Gen).

* Background in document intelligence systems, multi-agent orchestration, or knowledge graph integration.

* Experience with Lang Chain, Llama Index, or similar frameworks for RAG orchestration.

* Hands-on knowledge of MLOps tools and data versioning (DVC, MLflow, Weights & Biases).

* Understanding of…