Senior AI Platform Engineer,Atlas AI Job Phoenix area,Arizona USA,Software Development

What Cognite is:
Relentless to achieve

Cognite operates at the forefront of industrial digitalization, building AI and data solutions that solve the world’s hardest, highest‑impact problems. With unmatched industrial heritage and a comprehensive suite of AI capabilities, including low‑code AI agents, Cognite accelerates the digital transformation to drive operational improvements.

We thrive in challenges. We challenge assumptions. We execute with speed and ownership. If you view obstacles as signals to step forward – not backwards – you’ll feel right at home here.

Our Moonshot is bold:
Unlock $100B in customer value by 2035, and redefine how global industry works. Join us in this venture where AI and data meet ingenuity, and together, we will forge the path to a smarter, more connected industrial future.

Role Overview

We are seeking an AI Platform Engineer to join the Cognite Atlas AI Product team in Phoenix, AZ, to engineer, build, and operate the production‑grade, multi‑cloud platform that enables our internal and partner teams to build, deploy, and manage industrial AI agents. Your work will directly impact industrial efficiency and sustainability.

Responsibilities

Design, build, and maintain the core Python SDKs and services for the Atlas AI platform.
Build the core agentic runtime, ensuring it is scalable, meets its SLOs, and can reliably manage the state, orchestration, and execution of industrial agents.
Develop a robust, governed, and secure framework for AI agent tool‑use, engineering platform components that allow solution engineers to safely add new tools and which manage secure execution, monitoring, and access control.
Manage the LLM serving layer, including deploying and optimizing models for low‑latency/high‑throughput inference, and build and maintain model routing logic for cost‑performance balance.
Implement evaluation and observability for all AI services, creating standardized frameworks for performance, accuracy, cost, and safety evaluation of LLMs and agentic workflows, and drive automated testing strategies.
Own the full development lifecycle for services in a production SaaS environment, including automated code‑coverage goals, rigorous code reviews, defining SLOs, participating in on‑call rotations, and ensuring a fast incident‑response process.
Work closely with the Lead Architect to translate the technical vision into implemented, production‑grade services, and partner with Solution Engineers to understand needs and abstract common patterns.
Stay up to date on the latest developments in the field and mentor junior developers.

Required Qualifications

Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent practical experience.
8+ years of professional experience in backend software engineering, platform engineering, or MLOps, with a proven track record of architecting and operating complex systems at scale.
2+ years of hands‑on experience building applications or platforms on top of AI/ML models or LLMs.
Expert‑level proficiency in Python and a strong background in software architecture, robust API design, and building maintainable, well‑documented SDKs.
Hands‑on experience with Kubernetes (K8s) and building services on managed PaaS in a multi‑cloud environment (AWS, Azure, GCP), and strong understanding of Infrastructure as Code such as Terraform.
Proven experience building and operating production‑grade SaaS software, understanding the full development life cycle, including CI/CD, monitoring, telemetry, and on‑call incident response.
Practical experience with LLM orchestration frameworks (Bedrock, Vertex, Semantic Kernel, Lang Chain).
Strong verbal and written communication skills, with the ability to articulate complex technical designs and decisions clearly.

Preferred Experience

Hands‑on experience deploying and managing LLMs in production using high‑performance serving frameworks.
Experience with MLOps/LLMOps tools for tracing, monitoring, and evaluating LLM applications (Lang Smith, Arize, Phoenix, or equivalent).
Experience with RAG infrastructure, embedding generation pipelines, vector database integrations, and high‑performance vector similarity…

Senior AI Platform Engineer, Atlas AI