Principal AI Systems Engineer- Agentic and Productivity Systems
Listed on 2026-07-01
-
Software Development
AI Engineer (Applied/Software), AI Reliability/ Performance Engineer, DevOps, Cloud Engineer - Software
Senior AI Systems Engineer
The Creative Cloud Engineering organization is building the next generation of AI-powered engineering infrastructure to accelerate developer productivity and operational excellence across the Creative Cloud ecosystem. As we expand into AI-driven workflows across developer productivity and platform initiatives, we are looking for a Senior AI Systems Engineer who operates at the intersection of experimentation and production systems. This role focuses on designing, orchestrating, and operationalizing agent-based systems that improve engineering workflows across CI/CD, developer tooling, and operational diagnostics.
This is not a research role and not a prompt-engineering role. This is a systems engineering role focused on building durable infrastructure. You will help build AI-native engineering capabilities that compound engineering velocity across Creative Cloud over time.
Agentic Workflow Development
· Design and prototype agent-based systems for engineering workflows such as CI diagnostics, code review automation, build failure triage, and autonomous debugging
· Develop multi-agent orchestration patterns with structured state, memory, and control boundaries
· Rapidly evaluate emerging AI frameworks, agent tooling, and developer AI platforms in real-world engineering environments
AI Systems Infrastructure
· Build reusable orchestration layers and service architectures for AI-powered engineering systems
· Develop structured evaluation pipelines including trace-based evaluation and regression testing for agent behavior
· Implement feedback loops and instrumentation that continuously improve AI system performance
Production Hardening
· Convert experimental workflows into secure, scalable, production-grade services
· Implement observability, tracing, cost controls, and model routing
· Ensure reliability, operational stability, and measurable impact of AI-powered systems
Platform Strategy & Collaboration
· Define internal standards for AI experimentation, evaluation, deployment, and monitoring
· Partner with Dev Ex, CI/CD, and platform teams across Creative Cloud to embed AI-native capabilities
· Build cohesive infrastructure that prevents tool sprawl and enables reusable AI productivity systems across teams
What Success Looks Like
· Production-grade AI agents integrated into engineering workflows and CI systems
· A standardized evaluation and tracing framework adopted across Creative Cloud engineering teams
· Measurable reductions in manual debugging, failure triage, and operational friction
· Reusable AI infrastructure components leveraged across multiple engineering teams
· A clear AI productivity roadmap aligned with Creative Cloud platform initiatives
Required Qualifications
· 8+ years of software engineering experience, with demonstrated depth in systems-level work
· Strong systems engineering experience (Python, Go, Type Script, or similar)
· Experience building distributed systems, developer platforms, or infrastructure services
· Experience integrating LLMs or AI APIs into production systems
· Experience evaluating and integrating across multiple AI providers (e.g., AWS Bedrock, Anthropic, OpenAI) including cost optimization and capacity planning
· Strong understanding of observability, metrics, logging, and tracing systems
· Experience operating production services at scale
Preferred Qualifications
· Experience with agent frameworks (Lang Graph, Auto Gen, CrewAI, or similar)
· Experience with embeddings, vector databases, or RAG architectures
· Experience designing evaluation and benchmarking systems for AI workflows
· Experience with CI/CD platforms, developer tooling, or build systems
· Experience building internal developer productivity platforms
· Familiarity with cost-aware model orchestration and multi-model routing
Ideal Candidate Profile
· Has built and shipped an AI-powered system end-to-end, not just integrated an API
· Can show a prototype they took from experiment to production
· Comfortable making infrastructure decisions with incomplete information
· Has debugged LLM reliability issues in production (latency, cost, failure modes, concurrency limits)
· Experimental but pragmatic —…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).