Beatdapp is a venture-backed startup delivering the most advanced streaming integrity and recommendation technology in the world
1. While our roots are in fighting the multi‑billion dollar problem of streaming fraud, we have leveraged our "Trust & Safety Operating System" to power a new generation of discovery. We believe that true personalization starts with verified behavior. By filtering out noise and manipulated signals before they impact the model, we build recommendation engines on a foundation of clean, authentic data.
We are looking for builders who want to work with the world’s best streaming services and music labels to reshape how content is discovered.
We are seeking a ML/Dev Ops Engineer who is passionate about building the high‑availability systems that allow our machine learning models to thrive in production. In this role, you will operate at the intersection of cloud infrastructure and machine learning operations, taking full ownership of multi‑cluster Kubernetes environments, ensuring API workloads scale seamlessly, and automating deployment pipelines to perfection.
Responsibilities- Manage and optimize multi‑cluster Kubernetes (K8s) environments, implementing sophisticated autoscaling policies and node management strategies for high‑availability ML workloads.
- Design and orchestrate live service deployments using strategies such as A/B testing and Canary releases, ensuring seamless rollbacks and API versioning.
- Design and maintain infrastructure using Infrastructure as Code (IaC) principles for environment consistency and rapid disaster recovery.
- Own logging, traces, and metrics components; define error budgets and maintain the health monitoring systems that keep our Rec Sys engine running 24/7.
- Collaborate with security teams to enforce patch management, secrets handling (IAM/Secret Manager), and data encryption protocols to protect sensitive streaming data.
- Automate routine operational tasks and environment provisioning; manage outages with a critical‑thinking mindset and clear communication.
- 3+ years of professional experience in Dev Ops, SRE, or Infrastructure Engineering, preferably supporting data‑intensive or ML applications.
- Deep familiarity with Kubernetes, including compute instances, network configuration (VPCs/Subnets), and scaling API workloads.
- Proficiency in writing clean, scalable, and object‑oriented code for large‑scale data in virtualized environments.
- Proven track record of building automated CI/CD pipelines, managing image registries (Docker/Podman), and handling complex code versioning.
- Strong understanding of data stores (Relational vs. Non‑relational), caching strategies, and data transfer protocols (HTTPS/APIs).
- Experience working with sensitive data, encryption, and secure cloud networking.
- Familiarity with Google Cloud Platform (GCP) services and Terraform.
- Hands‑on work with Istio or Linkerd for Kubernetes service mesh.
- Experience with Python and deploying dedicated Rec Sys infrastructure or vector databases.
- Experience with Git Hub Actions (GHA) and building highly automated, self‑healing deployment workflows.
- Strong feel for creating clear architecture diagrams, code commenting, and technical design documents.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: