Software Engineer II; Backend + Data pipelines
Listed on 2025-12-02
-
IT/Tech
Data Engineer, Machine Learning/ ML Engineer, AI Engineer, Data Scientist
Overview
1 week ago Be among the first 25 applicants
About The Company
At Scribd (pronounced “scribbed”), our mission is to spark human curiosity. Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empower collective expertise through our three products:
Everand, Scribd, and Slideshare.
We support a culture where our employees can be real and be bold; where we debate and commit as we embrace plot twists; and where every employee is empowered to take action as we prioritize the customer.
When it comes to workplace structure, we believe in balancing individual flexibility and community connections. It’s through our flexible work benefit, Scribd Flex, that employees – in partnership with their manager – can choose the daily work-style that best suits their individual needs. A key tenet of Scribd Flex is our prioritization of intentional in-person moments to build collaboration, culture, and connection.
For this reason, occasional in-person attendance is required for all Scribd employees, regardless of their location.
So what are we looking for in new team members? Well, we hire for “GRIT”. The textbook definition of GRIT is demonstrating the intersection of passion and perseverance towards long term goals. At Scribd, we are inspired by the potential that this can unlock, and ask each of our employees to pursue a GRIT-ty approach to their work. In a tactical sense, GRIT is also a handy acronym that outlines the standards we hold ourselves and each other to.
Here’s what that means for you: we’re looking for someone who showcases the ability to set and achieve Goals, achieve Results within their job responsibilities, contribute Innovative ideas and solutions, and positively influence the broader Team through collaboration and attitude.
The ML Data Engineering team powers metadata extraction, enrichment, and content understanding across all Scribd brands. We process hundreds of millions of documents, billions of images, and deliver high-quality metadata to enable content discovery and trust for millions of users worldwide. Our systems operate at massive scale, supporting diverse datasets like user-generated content (UGC), ebooks, audio books, and more. We work at the intersection of machine learning, data engineering, and distributed systems, collaborating closely with applied research and product teams to deploy scalable ML and LLM-powered solutions in production.
Role OverviewWe’re seeking a Software Engineer II with strong backend development experience and a passion for solving complex data challenges this role, you’ll design, build, and optimize distributed systems that extract, enrich, and process metadata for a wide range of content. You’ll work closely with ML engineers, product managers, and cross-functional partners to integrate machine learning models and LLM-based services into production pipelines and deliver impactful, high-performance solutions.
This role offers the opportunity to work on cutting-edge generative AI and metadata enrichment problems at a truly global scale.
Our team uses various technologies. The following are the ones that we use on a regular basis:
Python, Scala, Ruby on Rails, Airflow, Databricks, Spark, HTTP APIs, AWS (Lambda, ECS, SQS, Elasti Cache, Sagemaker, Cloudwatch, Datadog) and Terraform.
- Design and build scalable systems to extract, enrich, and process metadata from millions of documents, images, and audio content.
- Leverage LLMs to integrate capabilities like summarization, classification, extraction, and enrichment into metadata pipelines.
- Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions.
- Optimize and refactor existing systems for performance, scalability, and reliability.
- Ensure data accuracy, integrity, and quality through automated validation and monitoring.
- Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase.
- Manage and maintain data pipelines, security and infrastructure
- 4+ years of…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).