Data Scientist Job Sunnyvale area,California USA,IT/Tech

Position: Staff, Data Scientist
Position Summary...

Role summary

Join Merchandising Decision Sciences (MDS) as the founding Staff Data Scientist for our new External Data and Analytics Products team. You will design, build, deploy, scale and monitor the ML systems that power Walmart's view of ROM - the Rest of Market - the slice of retail that doesn't ring at our own registers but shapes every category decision we make.

You will own three model families end-to-end: embedding-driven hierarchy classification, GMV distribution normalization and projection, and causal impact modeling to market share. You will be the only data scientist on the program at the start, so we need someone who can architect for scale on Databricks (on GCP) from day one, ship to production, set up the MLOps foundations, and hand a healthy, well-instrumented platform to the ML engineering team that grows in behind you.

This is a builder's role with a clear runway: get the first models live, prove the lift, and shape the team that scales them.

Have you ever wondered how Walmart sees the Rest of Market - the part of retail we don't ring ourselves - and decides where to grow share next? Do you get a thrill from being the first scientist on a program: the one who picks the stack, ships the first model, sets the bar, and watches the platform you built fill up with scientists behind you?

We'd love to put your end-to-end ML skills to work on one of retail's hardest measurement problems.

About the team

External Data and Analytics Products is a brand-new subteam within Merchandising Decision Sciences. We acquire, model, and productize syndicated and external data - NielsenIQ, Circana, GS1, and the rest - into analytics and ML services that merchants and systems use to make sharper, faster decisions. Our charter is to turn the noisy, fragmented view of the outside world into a calibrated signal Walmart can plan against.

We work as a full-stack team and we hold ourselves to engineering-level rigor: every model we ship has an owner, a monitor, and a runbook.

What you'll do...

What you'll do

* Design, build, deploy, and monitor embedding-based classification models that align external product signals to the Walmart merchandising hierarchy - from candidate generation and ANN retrieval through fine-tuned classifiers and human-in-the-loop feedback for long-tail nodes.

* Develop GMV distribution normalization and projection models that reconcile heterogeneous internal and external GMV signals across categories, time, and geography - and produce projections business partners can plan against.

* Build causal impact models that quantify market-share movement from merchandising actions (assortment, pricing, promo, distribution) using methods such as difference-in-differences, synthetic control, Bayesian structural time series, and uplift modeling - and clearly communicate assumptions, sensitivity, and confidence to non-technical leaders.

* Engineer for production from day one on Databricks (on GCP) - PySpark + Delta for distributed training and inference, MLflow for tracking and registry, Unity Catalog for governance, Databricks Model Serving and Jobs for deployment, Big Query, Dataproc and Vertex AI where they fit best.

* Establish the MLOps foundations the ROM platform will live on: CI/CD for models, feature management, drift and quality monitoring, retraining triggers, shadow deployments, model cards, and on-call runbooks - so the ML engineers who join behind you can scale the platform without re-platforming it.

* Own the end-to-end ML lifecycle for every model you put in production - problem framing, data contracts, training, evaluation, deployment, monitoring, retraining, and incident response.

What you'll bring

* Extensive industry experience as a hands-on data scientist who has personally taken ML systems from notebook to production at scale and stayed on them through monitoring, drift, and retraining.

* Deep, hands-on experience shipping and scaling ML on Databricks - PySpark, Delta, MLflow (tracking and registry), Unity Catalog, Databricks Jobs and Workflows, and Databricks Model Serving. You know where Databricks shines and where to reach for something else.

* Strong production fluency with GCP - Big Query, GCS, Vertex AI, Cloud Run, Composer/Airflow - and the ability to wire Databricks and GCP services together cleanly.

* Proven expertise with vector embeddings: training, fine-tuning, and evaluating embedding models for retail/product data; pairing embeddings with classifiers; ANN retrieval and vector indexing at catalog scale; choosing the right embedding model for the right job.

* Deep expertise in supervised classification at scale, including tree ensembles (XGBoost / LightGBM), embedding-based classifiers, and transformer fine-tuning; comfort with severe class imbalance, noisy labels, hierarchy-aware loss design, and long-tail evaluation.

* Strong command of forecasting and distribution modeling - hierarchical and Bayesian methods, reconciliation across hierarchies, calibrated probabilistic…