AI Tech Lead - Machine Learning Engineer Job Schenectady area,New York USA,IT/Tech

Position: AI Tech Lead - Staff Machine Learning Engineer

AI Tech Lead - Staff Machine Learning Engineer

Location: USA

The proliferation of AI and machine log data has the potential to give organizations unprecedented real-time visibility into their infrastructure and security operations. With this opportunity comes significant technical challenges around ingesting, managing, and reasoning over massive, heterogeneous, high-velocity data streams at global scale.

As a Staff Machine Learning Engineer – AI Tech Lead
, you will lead the design and delivery of the next generation of Agentic AI systems for Security Operation Center (Agentic SOC). You will evaluate, prototype, and product ionize state-of-the-art agentic AI technologies and build scalable multi-agent architectures that reason over large-scale machine data to drive real-time detection, investigation, and response.

This is a highly technical leadership role with deep ownership of AI agent architecture, evaluation, LLM fine-tuning, and production AI infrastructure. You will help define the technical direction for Sumo Logic’s agentic AI platform and play a key role in bringing advanced AI capabilities to customers at global scale.

Responsibilities

Lead and partner with fellow leadership members and teams on technical evaluation and adoption of cutting-edge agentic AI platforms, including Anthropic (Claude), Lang Chain/Lang Graph, AWS Bedrock, and other emerging agent frameworks.
Architect, prototype, and product ionize multi-agent AI systems for Agentic SOC use cases, including detection, triage, investigation, and response workflows.
Own the design of core agent architecture components, including planning, execution, tool orchestration, memory, context engineering, and long-running agent workflows.
Lead AI agent evaluation systems, including offline and online evaluation pipelines, golden datasets, synthetic data generation, human- and LLM-based judging, and continuous quality monitoring.
Drive LLM fine-tuning and alignment efforts to improve domain-specific reasoning, accuracy, and reliability for security and observability use cases.
Design scalable LLMOps and AI agent infrastructure, including inference routing, latency optimization, cost control, and production observability for agent systems.
Partner with product, security, and data platform leadership and teams to deliver end-to-end AI agent capabilities from prototype to customer-facing production systems.
Lead and partner on technical direction and mentorship for AI engineers working on agentic AI and LLM systems.
Define and implement best practices for AI safety, reliability, evaluation, and monitoring in production agentic systems.
Operate as a senior technical owner in ambiguous problem spaces—setting technical direction, breaking down complex problems, and driving delivery across teams.

Required Qualifications

B.Tech, M.Tech, or Ph.D. in Computer Science, Machine Learning, Data Science, or a related technical field.
5+ years of hands‑on industry experience building, operating, and leading production ML/AI systems, with demonstrated technical leadership and ownership.
Strong foundation in machine learning, distributed systems, data pipelines, and large‑scale system design.
Deep industry understanding of LLMs, prompt engineering, context engineering, agentic AI design patterns, and reasoning workflows.
Strong proficiency in Python and modern ML/AI ecosystems.
Experience designing and operating evaluation frameworks for ML/LLM systems (offline + online).
Proven ability to lead complex technical initiatives across teams and influence architecture decisions.
Excellent communication skills and ability to translate complex AI systems into business impact.

Desired Qualifications

Hands‑on experience building and scaling agentic AI systems or multi‑agent architectures in production.
Experience with modern agent frameworks such as Lang Graph, Lang Chain, CrewAI, or similar.
Experience with major foundation model platforms such as Anthropic, OpenAI, AWS Bedrock, or Vertex AI.
Experience with LLM fine‑tuning pipelines (SFT, RLHF/RLAIF, preference learning, domain adaptation).
Strong background in LLMOps, including inference optimization, latency/cost management,…