Principal Applied Scientist; CoreAI Job Redmond area,Washington USA,Software Development

Position: Principal Applied Scientist (CoreAI)
** Overview*
* You're joining Core AI, the team at the forefront of redefining how software is built and experienced. We create the foundational platforms, services, and developer experiences that power next-generation applications using Generative AI, enabling developers and enterprises to unlock the full potential of AI to build intelligent, adaptive, and transformative software.

You will be a technical contributor driving the applied science foundation for observability in AI agents and multi-agent systems running s role focuses on understanding how intelligent agents behave in production-their quality, safety, reliability, cost, and evolution over time. You will develop and apply scientific methods, evaluation frameworks, and measurement systems that help teams understand, benchmark, diagnose, and safely improve agent-based systems with confidence.

AI agents introduce fundamentally new observability challenges: non-deterministic execution, tool- and model-driven decision paths, emergent multi-agent behaviors, and quality signals that go far beyond traditional uptime metrics. In this role, you will operate at the intersection of agent architecture, telemetry, evaluation science, and responsible AI, shaping how Microsoft measures and improves observable AI systems.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

** Responsibilities*
* ** What*
* ** You'll*
* ** Do*
* + Develop evaluation and measurement frameworks for single-agent and multi-agent systems, spanning quality, safety, reliability, cost, and behavioral consistency.

+ Design methodologies that connect offline evals, online signals, and production telemetry to explain how prompt, tool, model, or orchestration changes affect real-world agent performance.

+ Define scientifically grounded quality signals and benchmarks for agent systems, including task success, tool-use effectiveness, plan quality, failure modes, coordination quality, and user-perceived outcomes.

+ Build models and analysis techniques that help detect regressions, identify root causes, and characterize agent behavior across diverse workflows and environments.

+ Advance observability for AI systems through new approaches to trace analysis, agent health modeling, behavioral clustering, anomaly detection, and multi-agent coordination analysis.

+ Partner with engineering teams to operationalize evaluation and observability methods in production systems, enabling safe iteration through staged rollouts, experimentation, A/B testing, and automated regression detection.

+ Contribute to instrumentation and semantic standards for agent observability, helping make agent execution more explainable, diagnosable, and comparable across systems.

+ Collaborate deeply with product and platform teams across Foundry, Azure Monitor, and agent runtimes to shape end-to-end experiences for evaluation, benchmarking, monitoring, and investigation.

+ Act as a technical leader by setting scientific direction, driving research-informed product decisions, mentoring others, and raising the technical bar across the organization.

** Technical Focus Areas*
* + Evaluation science for agent and multi-agent systems: offline, online, and continuous evals; benchmark design; synthetic data; task success measurement

+ Agent and multi-agent architectures: planners, tool use, memory, orchestration, and coordination patterns

+ Applied machine learning and statistical methods for behavioral analysis, anomaly detection, experimentation, and regression detection

+ Observability data for AI systems: traces, logs, metrics, evaluations, and cost/performance signals

+ Safety and responsible AI signals: policy compliance, risk detection, auditability, and safe logging

+ Benchmarking and experimentation for agent systems, including A/B tests, canaries, and staged rollouts

+ Explainability and diagnosis for complex agent workflows and model-driven decision paths

** Qualifications*
* *
* Required Qualifications:

*
* + Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

+ OR equivalent experience.

** Other Requirements:*
* Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

+ Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud…