Senior Software Engineer, Data Acquisition Product & Engineering - Remote
California, Moniteau County, Missouri, 65018, USA
Listed on 2025-12-10
-
Software Development
Data Engineer, Software Engineer
Location: California
Note for all engineering roles: with the rise of fake applicants and AI-enabled candidate fraud, we have built in additional measures throughout the process to identify such candidates and remove them.
About UsPeople Data Labs (PDL) is the provider of people and company data. We do the heavy lifting of data collection and standardization so our customers can focus on building and scaling innovative, compliant data solutions. Our sole focus is on building the best data available by integrating thousands of compliantly sourced datasets into a single, developer-friendly source of truth. Leading companies across the world use PDL’s workforce data to enrich recruiting platforms, power AI models, create custom audiences, and more.
We are looking for individuals who can balance extreme ownership with a "one-team, one-dream" mindset. Our customers are trying to solve complex problems, and we only help them achieve their goals as a team. Our Data Engineering & Acquisition Team ensures our customers have standardized and high quality data to build upon.
You will be crucial in accelerating our efforts to build standalone data products that enable data teams and independent developers to create innovative solutions at massive scale. In this role, you will be working with a team to continuously improve our existing datasets as well as pursuing new ones. If you are looking to be part of a team discovering the next frontier of data-as-a-service (DaaS) with a high level of autonomy and opportunity for direct contributions, this might be the role for you.
We like our engineers to be thoughtful, quirky, and willing to fearlessly try new things. Failure is embraced at PDL as long as we continue to learn and grow from it.
- Contribute to the architecture and improvement of our data acquisition and processing platform, increasing reliability, throughput, and observability
- Use and develop web crawling technologies to capture and catalog data on the internet
- Build, operate, and evolve large-scale distributed systems that collect, process, and deliver data from across the web
- Design and develop backend services that manage distributed job orchestration, data pipelines, and large-scale asynchronous workloads
- Structure and model captured data, ensuring high quality and consistency across datasets
- Continuously improve the speed, scalability, and fault-tolerance of our ingestion systems
- Partner with data product and engineering teams to design and implement new data products powered by the data you help collect, and enhance and improve upon existing products
- Learn and apply domain-specific knowledge in web crawling and data acquisition, with mentorship from experienced teammates and access to existing systems
- 7+ years of professional experience building or operating backend or infrastructure systems at scale
- Solid programming experience in Python, Go, Rust, or similar, including experience with async / await, coroutines, or concurrency frameworks
- Strong grasp of software architecture and backend fundamentals; you can reason clearly about concurrency, scalability, and fault tolerance
- Solid understanding of browser rendering pipeline, web application architecture (auth, cookies, http request / response)
- Familiarity with network architecture and debugging (HTTP, DNS, proxies, packet capture and analysis)
- Solid understanding of distributed systems concepts: parallelism, asynchronous programming, back pressure, and message-driven design
- Experience designing or maintaining resilient data ingestion, API integration, or ETL systems
- Proficiency with Linux / Unix command-line tools and system resource management
- Familiarity with message queues, orchestration, and distributed task systems (Kafka, SQS, Airflow, etc.)
- Experience evaluating and monitoring data quality, ensuring consistency, completeness, and reliability across releases
- Work independently in a fast-paced, remote-first environment, proactively unblocking themselves and collaborating asynchronously
- Communicate clearly and thoughtfully in writing (Slack, docs, design proposals)
- Write and maintain technical design…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).