Engineering Manager, ML/Data Engineering; Content Trust
Listed on 2026-01-12
-
IT/Tech
Data Engineer, AI Engineer, Machine Learning/ ML Engineer, Data Science Manager
Engineering Manager, ML/Data Engineering (Content Trust)
Join to apply for the role at Scribd, Inc.
About The CompanyAt Scribd Inc. (pronounced “scribbed”), our mission is to spark human curiosity. We create a world of stories and knowledge, democratize the exchange of ideas and information, and empower collective expertise through our four products:
Everand, Scribd, Slideshare, and Fable.
We support a culture where our employees can be real and bold; where we debate and commit as we embrace plot twists; and where every employee is empowered to take action as we prioritize the customer.
About The Team And RoleThe ML Data Engineering team is the backbone of Scribd’s commitment to a safe and trustworthy library. We build high-throughput, ML-driven data pipelines that process hundreds of millions of documents to detect, classify, and mitigate untrustworthy content.
As Manager of ML Data Engineering, you will lead a specialized team of engineers responsible for building scalable ML based foundations that can detect and deal with harmful content. You aren’t just moving data; you are building the infrastructure that allows ML models to reason across our entire corpus in batch and real-time. Your team’s work ensures that our safety classifiers and automated policy enforcement tools are performant, scalable, and resilient.
You will sit at the intersection of Big Data, AI, MLOps, and Platform Integrity, directly impacting the safety of millions of our users.
- Lead and grow a high-performing engineering team:
Manage, mentor, and recruit a world-class team of data and ML engineers. Foster a culture of technical excellence, operational rigor, and deep empathy for the user safety mission. - Architect scalable ML data pipelines:
Design and oversee the development of distributed data processing systems capable of handling hundreds of millions of documents. Ensure these pipelines support both batch and real-time inference for content moderation and risk detection. - Build the "Trust" scores:
Develop and maintain the foundational data layers – including semantic embeddings, metadata extracts, and behavioral signals – that power our Content Trust ML models. - Partner on AI/LLM Integration:
Work closely with the Search & Discovery and Applied Research teams to integrate ML/LLM-based reasoning into our trust pipelines, enabling more nuanced understanding of complex policy violations. - Drive Operational Excellence:
Establish SLAs for infrastructure, ensuring our automated enforcement systems are both fast and explainable. - Cross-functional Leadership:
Collaborate with Product Managers (Content Trust), Legal/Policy teams, and Data Science to translate evolving regulatory requirements (like the DSA) into robust technical architectures.
- Leadership
Experience:
8+ years of total engineering experience, with 3+ years specifically in a people management or technical lead role within a Data or ML Engineering organization. - Scale Expertise:
Proven track record of building and operating production‑grade data pipelines at massive scale (100M+ entities) using technologies like Spark, Flink, Kafka, or Airflow. - ML Infrastructure Fluency:
Deep understanding of the ML lifecycle, including feature engineering, model deployment (MLOps), and vector databases (e.g., Pinecone, Milvus, or Weaviate). - Trust & Safety Context:
Prior experience building systems for content moderation, fraud detection, spam prevention, or digital rights management. - Technical Breadth:
Strong proficiency in Python, Scala, or Go, and experience with cloud-native infrastructure (AWS/GCP, Kubernetes, and Snowflake/Big Query). - Strategic Communication:
Ability to explain complex architectural trade-offs to non-technical stakeholders in Legal, Policy, and Product.
- LLM Pipelines:
Experience building RAG (Retrieval‑Augmented Generation) pipelines or managing the data infra for fine‑tuning Large Language Models. - UGC
Experience:
Background working with large-scale User Generated Content (UGC) ecosystems and the unique challenges of unstructured document data. - Regulatory Knowledge:
Familiarity with the technical requirements of global…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).