Machine Learning Engineer III/Senior Machine Learning Engineer - AI Platform
Listed on 2026-06-05
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
Workday’s mission is to make hard work pay off. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, we shape the future of work so teams can reach their potential and focus on what matters most. You’ll feel our culture of integrity, empathy, and shared enthusiasm the moment you join, as we tackle big challenges with bold ideas and genuine care.
Aboutthe Team
The AI Platform organization builds advanced AI solutions that power the core Workday software by modeling user behavior and providing intelligent automation. We create features and solutions used by millions of end‑users, making work easier and balanced for Workday’s global customer base.
About the RoleAs a Machine Learning Engineer, you will help design and build our Agent Platform—the core infrastructure that enables teams to develop, deploy, orchestrate, and operate AI agents in production. The focus is on building systems and tooling to host and scale agent‑based applications powered by large language models (LLMs). You will partner closely with applied AI, product, and infrastructure teams to define how agents are built and operated across the organization.
Responsibilities- Design and build the core platform capabilities required to develop, host, and operate AI agents at scale.
- Develop infrastructure and services for agent execution, orchestration, state management, and runtime reliability.
- Build reusable abstractions, frameworks, and workflows in Python to support agent development patterns across teams.
- Design and implement systems for tool use, memory, retrieval, workflow coordination, and human‑in‑the‑loop interactions.
- Build and maintain services deployed on Kubernetes, focusing on scalability, resiliency, and operational excellence.
- Develop capabilities for evaluation, tracing, observability, debugging, and performance monitoring of agent behavior in production.
- Improve platform performance across latency, throughput, fault tolerance, and cost efficiency.
- Create internal APIs, SDKs, and developer tooling that make it easier for engineering teams to build on the platform.
- Partner with cross‑functional teams to product ionize new agent use cases and establish common platform patterns and best practices.
- Contribute to technical architecture and help define the roadmap for agent infrastructure and platform evolution.
- 3+ years experience as part of a data science, machine learning software development team or a PhD/equivalent program.
- 5+ years experience in Python and building reliable, maintainable production services.
- 3+ years experience with distributed systems, APIs, asynchronous workflows, and service‑oriented architecture.
- 3+ years experience designing systems with a focus on scalability, reliability, observability, and maintainability.
- 6+ years of software engineering experience, including building and operating production‑grade backend, ML, or platform systems.
- 8+ years experience in Python and building reliable, maintainable production services.
- 5+ years experience with distributed systems, APIs, asynchronous workflows, and service‑oriented architecture.
- 5+ years experience designing systems with a focus on scalability, reliability, observability, and maintainability.
- Experience building or supporting agent platforms, AI infrastructure, or internal developer platforms.
- Experience building and deploying machine learning or LLM‑powered applications in production.
- Familiarity with LLM application patterns, including:
- Tool calling
- Retrieval‑augmented generation (RAG)
- Memory and context management
- Multi‑step workflows and orchestration
- Human‑in‑the‑loop systems
- Experience designing and implementing evaluation frameworks for LLM or agent quality.
- Familiarity with vector databases, model serving, prompt/version management, and experimentation tooling.
- Solid knowledge of Data Science principles and their application in NLP.
- Experience running services in Kubernetes‑based environments.
- Ability to work across ambiguity, make strong technical tradeoffs, and drive projects from concept to production.
- Strong communication and collaboration…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).