AI Tech Lead - Machine Learning Engineer
Listed on 2026-06-10
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Data Scientist
AI Tech Lead - Staff Machine Learning Engineer
Location: USA
The proliferation of AI and machine log data has the potential to give organizations unprecedented real-time visibility into their infrastructure and security operations. With this opportunity comes significant technical challenges around ingesting, managing, and reasoning over massive, heterogeneous, high-velocity data streams at global scale.
As a Staff Machine Learning Engineer – AI Tech Lead
, you will lead the design and delivery of the next generation of Agentic AI systems for Security Operation Center (Agentic SOC). You will evaluate, prototype, and product ionize state-of-the-art agentic AI technologies and build scalable multi-agent architectures that reason over large-scale machine data to drive real-time detection, investigation, and response.
This is a highly technical leadership role with deep ownership of AI agent architecture, evaluation, LLM fine-tuning, and production AI infrastructure. You will help define the technical direction for Sumo Logic’s agentic AI platform and play a key role in bringing advanced AI capabilities to customers at global scale.
Responsibilities- Lead and partner with fellow leadership members and teams on technical evaluation and adoption of cutting-edge agentic AI platforms, including Anthropic (Claude), Lang Chain/Lang Graph, AWS Bedrock, and other emerging agent frameworks.
- Architect, prototype, and product ionize multi-agent AI systems for Agentic SOC use cases, including detection, triage, investigation, and response workflows.
- Own the design of core agent architecture components, including planning, execution, tool orchestration, memory, context engineering, and long-running agent workflows.
- Lead AI agent evaluation systems, including offline and online evaluation pipelines, golden datasets, synthetic data generation, human- and LLM-based judging, and continuous quality monitoring.
- Drive LLM fine-tuning and alignment efforts to improve domain-specific reasoning, accuracy, and reliability for security and observability use cases.
- Design scalable LLMOps and AI agent infrastructure, including inference routing, latency optimization, cost control, and production observability for agent systems.
- Partner with product, security, and data platform leadership and teams to deliver end-to-end AI agent capabilities from prototype to customer-facing production systems.
- Lead and partner on technical direction and mentorship for AI engineers working on agentic AI and LLM systems.
- Define and implement best practices for AI safety, reliability, evaluation, and monitoring in production agentic systems.
- Operate as a senior technical owner in ambiguous problem spaces—setting technical direction, breaking down complex problems, and driving delivery across teams.
- B.Tech, M.Tech, or Ph.D. in Computer Science, Machine Learning, Data Science, or a related technical field.
- 5+ years of hands‑on industry experience building, operating, and leading production ML/AI systems, with demonstrated technical leadership and ownership.
- Strong foundation in machine learning, distributed systems, data pipelines, and large‑scale system design.
- Deep industry understanding of LLMs, prompt engineering, context engineering, agentic AI design patterns, and reasoning workflows.
- Strong proficiency in Python and modern ML/AI ecosystems.
- Experience designing and operating evaluation frameworks for ML/LLM systems (offline + online).
- Proven ability to lead complex technical initiatives across teams and influence architecture decisions.
- Excellent communication skills and ability to translate complex AI systems into business impact.
- Hands‑on experience building and scaling agentic AI systems or multi‑agent architectures in production.
- Experience with modern agent frameworks such as Lang Graph, Lang Chain, CrewAI, or similar.
- Experience with major foundation model platforms such as Anthropic, OpenAI, AWS Bedrock, or Vertex AI.
- Experience with LLM fine‑tuning pipelines (SFT, RLHF/RLAIF, preference learning, domain adaptation).
- Strong background in LLMOps, including inference optimization, latency/cost management,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).