Senior Infrastructure Engineer; Backend/Data
Berkeley, Alameda County, California, 94709, USA
Listed on 2026-05-18
-
Software Development
AI Engineer, Data Engineer
About the role
We're looking for a Senior Infrastructure Engineer (Backend/Data Performance) to help us build the foundational pipelines that power Earth Species Project's mission to understand animal communication with advanced AI. You’ll design and optimize scalable systems that let our researchers experiment faster, with production-quality reliability.
Your work will focus on data infrastructure and backend performance, creating pipelines and storage layers that can handle diverse species data ’ll collaborate closely with researchers, engineers, and external partners to make complex AI workflows simple, efficient, and reliable.
In this role you will- Design and optimize high-performance data pipelines for distributed training and storage (using tools like Arrow, DuckDB, Lance
DB, Big Query, vector databases). - Focus on low-level optimizations (latency, throughput, reliability, GPU usage).
- Build monitoring and visualization tools for tracking data quality, pipeline performance, and experiments.
- Optimize distributed AI workloads for reliability, latency, and efficiency.
- Scope and supervise projects so that interns, PhD students, and post-docs can contribute and collaborate effectively.
- Support recruiting efforts and help shape the growth of the infrastructure team.
- 5+ years of backend or infrastructure engineering experience
- Strong Python programming skills (bonus points for lower-level languages)
- Experience with distributed systems and cloud platforms (AWS, GCP, Azure)
- Hands‑on experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform)
- Experience building or supporting ML/AI infrastructure in production
- Experience with high-performance data tools (DuckDB, Apache Spark, Delta Lake)
- GPU orchestration and large-scale model training experience
- Familiarity with ML platforms (Sage Maker, Vertex AI) and frameworks (PyTorch, JAX)
- Experience mentoring junior engineers, interns, or researchers and breaking down complex projects into manageable tasks
- Experience participating in technical hiring processes and evaluating candidates
- Have deep knowledge of training architectures, CUDA programming, or TPU optimization
- Have full-stack development experience with frameworks like React for building web applications
- Experience managing HPC infrastructure with tools like Slurm or Kubernetes clusters
- Background in monitoring stacks (Prometheus, Grafana) for ML pipeline observability
225500 - 235500 USD a year
Benefits- Medical insurance, dental insurance, and vision insurance – ESP covers 100% of the premium
- 401k plan with match (if based in the United States)
- 2,000 USD home office stipend
- Unlimited paid time off, with a recommended minimum of three weeks per year
- Flexible working hours
- Regular team retreats around the world
ESP is committed to equal employment opportunities regardless of race, color, religion, gender, gender identity or expression, pregnancy, sexual orientation, marital status, ancestry, national origin, genetics, disability, age, veteran status, and criminal history, consistent with legal requirements. We encourage folks of all backgrounds and perspectives to apply.
If you require any accommodations, please email us at jobs, and we’ll work with you to meet your accessibility needs.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).