Principal,Data Scientist,Experimentation Sciences Job Sunnyvale area,California USA,IT/Tech

Position Summary...

As a Principal Data Scientist at Walmart, you will define and execute the data science roadmap for the experimentation platform that powers trusted decision-making across Walmart's A/B testing ecosystem. This is a hands-on technical leadership role at the intersection of experimentation science, large-scale data systems, and AI evaluation. You will own the scientific direction behind experiment reporting, dashboards, guardrails, and reusable measurement services, ensuring experiment exposure data is stitched to business and operational outcomes with rigor, scalability, and clarity.

You will partner closely with engineering, product, and business teams to modernize our statistical tooling, improve self-service experimentation, and extend our measurement framework to emerging AI use cases including LLM evals, prompt evaluation, hybrid human/LLM judging, and offline-to-online quality measurement. We are looking for a self-starter who can move fluidly from strategy to hands-on prototyping, quickly validating ideas through lightweight automated workflows and proofs of concept.

What you'll do...

Role summary

As a Principal Data Scientist at Walmart, you will define and execute the data science roadmap for the experimentation platform that powers trusted decision-making across Walmart's A/B testing ecosystem. This is a hands-on technical leadership role at the intersection of experimentation science, large-scale data systems, and AI evaluation. You will own the scientific direction behind experiment reporting, dashboards, guardrails, and reusable measurement services, ensuring experiment exposure data is stitched to business and operational outcomes with rigor, scalability, and clarity.

You will partner closely with engineering, product, and business teams to modernize our statistical tooling, improve self-service experimentation, and extend our measurement framework to emerging AI use cases including LLM evals, prompt evaluation, hybrid human/LLM judging, and offline-to-online quality measurement. We are looking for a self-starter who can move fluidly from strategy to hands-on prototyping, quickly validating ideas through lightweight automated workflows and proofs of concept.

About the team

Our team owns and manages Walmart's experimentation platform, enabling A/B testing across multiple channels and regions. We build and maintain the scalable infrastructure, data foundations, and measurement systems required to support high experiment volume with reliable and accurate outcomes. One of the team's core responsibilities is generating experiment reports and dashboards that translate raw experiment data into trusted business insights. To do this, we own a broad set of ETL processes that generate, transform, and stitch experiment exposure data with business and operational metrics.

We also develop and maintain the statistical processes and guardrails that underpin sound decision-making, including sample imbalance checks, metric validation, and analysis standards. As experimentation expands into AI-powered experiences, the team is evolving the platform to support LLM evals, prompt evaluation, and new approaches to measuring quality, customer impact, and business value.

What you'll do

* Define the multi-year data science roadmap for experimentation reporting, dashboards, and measurement services, identifying the highest-leverage investments in methodology, automation, and self-service.

* Lead the design of scalable statistical frameworks for online experiments across product, business, and operational use cases, including guardrails, heterogeneity analysis, sequential decisioning, variance reduction, and quasi-experimental methods when randomized tests are not feasible.

* Partner with data engineering to design robust SQL and PySpark data models, pipelines, and observability standards that improve correctness, speed, and reusability of experimentation data assets.

* Establish and govern canonical experiment metrics, scorecards, and reporting standards across channels, regions, and surfaces.

* Define the strategy for AI-native experimentation and evaluation, including LLM eval frameworks, prompt evaluation, golden datasets, rubric design, human-in-the-loop review, LLM-as-a-judge calibration, and ongoing regression monitoring.

* Build lightweight proofs of concept and small automated workflows using tools such as Python, SQL, Airflow, and Google Cloud Platform technologies to validate ideas before broader platform investment.

* Serve as the senior technical advisor to leaders across product, engineering, and business on experimental design, causal interpretation, metric tradeoffs, and measurement risk.

What you'll bring

* Deep expertise in experimentation, causal inference, and statistical decision-making, with a track record of shaping how organizations design, analyze, and operationalize experiments at scale.

* Expert-level SQL and PySpark, strong Python skills, and hands-on experience working with high-volume,…

Principal, Data Scientist, Experimentation Sciences