Data Engineer Job San Francisco area,California USA,IT/Tech

About the Role

This is a foundational infrastructure role at a company where the data layer isn't a back-office function — it's the nervous system of a payments platform processing every agent transaction, policy decision, and risk signal in real time. The right person thrives on ownership, has strong opinions about data quality and governance, and moves with the urgency of someone who knows that bad data costs more than bad code.

As an early data engineer, you'll define not just the pipelines but the standards, architecture, and culture of data at Sapiom.

What You Will Do

You'll own Sapiom's data infrastructure end-to-end — designing and scaling ETL pipelines, defining schemas that survive 10x growth, and building the governance and quality frameworks that make data trustworthy across the company. You'll architect standardized data models that enable self‑serve AI‑powered insights, giving Analytics, Data Science, and product teams the visibility they need to move fast without coming to you for every query.

The mandate is broad: pipelines, quality, security, observability, and the cross‑functional partnerships that keep it all running.

Responsibilities

Build, scale, and optimize production‑quality ETL pipelines — owning the full lifecycle from ingestion through availability, with clear quality and SLA standards
Design data schemas and architect for scale — anticipating 10x data growth and building models that don't require rework when it arrives
Own data quality, governance, security, and schema design across the platform — setting the standards and making sure they hold
Develop standardized, self‑serve data models that enable AI‑powered analytics — reducing friction for partner teams and eliminating one‑off data pulls
Instrument pipeline observability and surface key health metrics to Analytics, Data Science, and Dev Ops — proactively surfacing issues before they become incidents
Partner closely with Data Science, Analytics, and Dev Ops — operating as a force multiplier across teams, not a bottleneck

Requirements

Demonstrated track record — 5+ years — transforming raw data into governed, well‑documented, production‑ready datasets that business teams can trust and use
Deep hands‑on experience building and deploying production data pipelines using SQL, Python, Spark, AWS Glue, EMR, DBT, and Airflow
Strong command of MPP databases — Snowflake, AWS Redshift, or Teradata — with 3+ years of hands‑on production use
Proven partnership record with Engineering, Analytics, Data Science, and Dev Ops teams — someone who treats cross‑functional relationships as core to the job, not peripheral to it
Architectural instincts — able to design schemas and systems that scale gracefully, not just handle today's load
Comfort operating in an on‑call rotation — including incident response outside regular working hours when the pipeline demands it
Clear communicator who can translate complex data infrastructure decisions into plain‑language insights for both technical and non‑technical stakeholders

#J-18808-Ljbffr