Sr. ML Infrastructure Engineer II,Personalization Job San Mateo area,California USA,Software Development

About Slickdeals:

We believe shopping should feel like winning. That's why 10 million people come to Slickdeals to swap tips, upvote the best finds, and share the thrill of a great deal. Together, our community has saved more than $10 billion over the past 26 years.

We're profitable, passionate, and in the middle of an exciting evolution-transforming from the internet's most trusted deal forum into the go-to daily shopping destination. If you thrive in a fast-moving, creative environment where ideas turn into impact fast, you'll fit right in.

The

Purpose:

The Person alization team owns the systems that decide what each Slickdeals user sees, from homepage and feed rankings to deal recommendations across the site and in lifecycle channels. Personalization is one of our highest-leverage investments: it directly drives engagement, retention, and revenue across tens of millions of monthly users.

We're hiring a Sr. ML Engineer II who can operate end-to-end across the recommendation stack. This is a true hybrid role with roughly half modeling and half infrastructure. You will design and ship recommendation models (retrieval, ranking, and re-ranking) and build the production ML systems that train, serve, and evaluate them 'll work closely with data scientists, product engineers, and the Search & Discovery and Shopping Graph teams.

You will be building products using technologies such as AWS Sage Maker, PyTorch, Tensor Flow, vector databases, Elasticsearch, HBase, SQS/Kafka, REST web services, LLMs, and more.

What You'll Do:

This role spans the full ML lifecycle for recommendations - from candidate generation through ranking, serving, and online evaluation. Concretely:

Modeling

* Design, train, and ship recommendation models including two-tower / dual-encoder retrieval, neural ranking, and re-ranking models

* Build embedding pipelines for users, deals, merchants, and content; iterate on representation learning approaches

* Improve candidate generation strategies, including ANN-based retrieval over learned embeddings

* Define and run rigorous offline evaluation (recall@k, NDCG, MAP, calibration) and partner with data science to design online A/B tests

* Partner with product and data science on personalization surfaces - homepage, feeds, deal pages, search re-ranking, and lifecycle channels

Infrastructure

* Build and own end-to-end ML pipelines for recommendations: data preparation, training, evaluation, deployment, and monitoring

* Design and operate low-latency model serving for high-QPS recommendation traffic

* Build feature pipelines and feature-store patterns that maintain online/offline parity

* Design, architect, and build reliability, observability, and utilization infrastructure for the recommendations stack

* Improve training cost, turnaround time, and reproducibility on the ML platform; collaborate with data scientists to unblock experimentation

Cross-cutting

* Encourage change, especially in support of ML engineering best practices, and maintain a high standard of excellence

* Collaborate with engineers within the team and across the company to solve complex data problems at scale

* Write high-quality, product-level code that is easy to maintain and test following standard methodologies

What We're Looking For:

* 8+ years of relevant professional experience

* Demonstrated experience designing, training, and shipping recommendation systems in production - not just classifiers or general ML

* Hands-on experience with deep learning for recsys: two-tower / dual-encoder models, embedding-based retrieval, neural ranking, or similar

* Strong ML fundamentals: model evaluation methodology, A/B testing, debugging models at scale, handling data and label quality issues

* Proficiency with ML modeling frameworks (PyTorch and/or Tensor Flow) (5+ yrs)

* Experience with model serving platforms (Torch Serve, Tensor Flow Serving, NVIDIA Triton, or comparable custom serving infrastructure)

* Experience with vector retrieval / ANN at scale (e.g., FAISS, ScaNN, Open Search k-NN, Pinecone, Weaviate, or similar)

* Experience working with cloud data processing technologies such as Apache Spark, Elasticsearch, Presto, SQL (3+ yrs)

* Proficiency in at least two of:
Linux, Ansible, Docker, Kubernetes (5+ yrs)

* Experience in distributed computing (7+ yrs)

* Experience working with AWS or similar cloud infrastructure (5+ yrs)

* Experience with hardware / resource management for ML training and/or deployment

* Knowledge of the open source landscape with judgment on when to choose open source versus build in-house

* Excellent analytical and problem-solving skills

* Comfort operating across both modeling and infrastructure - this is not a pure modeling or pure platform role

Nice to have:

* Experience with feature stores (Feast, Tecton, or custom)

* Experience with real-time / streaming feature engineering

* Experience with LLM-augmented retrieval or hybrid retrieval architectures

* E-commerce, content, or marketplace recommendation domain experience

LOCATION:

San Mateo, CA

Hybrid…

Sr. ML Infrastructure Engineer II, Personalization