Site Reliability Engineer Job San Francisco area,California USA,IT/Tech

Mission

Altimate AI, founded in 2022 in San Francisco, is revolutionizing enterprise data operations through the power of AI. Our mission is to alleviate the burden on overworked and understaffed enterprise data teams by providing innovative AI‑driven solutions that automate and accelerate a wide range of data tasks.

Our flagship product, Data Pilot, offers advanced data automation capabilities, while our new Data Mates technology brings the concept of agentic AI to data operations, acting as virtual teammates for data professionals. Our solutions seamlessly fit into existing tools like VSCode, Git, and Slack, performing tasks ranging from data documentation to performance optimization.

Who are we?

By leveraging a proprietary framework that combines multiple language models and a custom‑built knowledge graph, we enable contextually aware AI agents that integrate seamlessly into existing workflows. Our solutions, including ambient AI for continuous monitoring and optimization, are designed to meet the growing demands of data operations, business intelligence, and analytics in an era of ever‑increasing data volumes.

Used by thousands of users across the world, and backed by prominent investors, we’re positioned at the forefront of the AI‑powered data engineering revolution.

You can read more about us in a recently published Venture Beat article.

Team

As a team, we are Silicon Valley veterans who previously created category‑defining data and AI products loved by thousands of companies worldwide. We have experienced the journey of a small startup to IPO very closely. We have started on a similar journey again and are backed by prominent advisors and VC firms with multi‑billion dollar portfolios.

Role Overview

We are looking for a Site Reliability Engineer to join our founding team at Altimate AI. In this role, you will architect and own our infrastructure strategy from the ground up, ensuring our AI products are delivered reliably and will collaborate closely with our software, AI and data engineering teams to build a robust, secure, and high‑performance infrastructure.

As the first Dev Ops specialist, you’ll have the opportunity to set the standard for Dev Ops practices, shape our infrastructure roadmap, and influence the technical direction of the company.

Key Responsibilities

Infrastructure Architecture:
Design, build, and maintain a scalable, secure, and highly available cloud infrastructure to support Altimate AI’s platforms (Data Pilot & Data Mates).
CI/CD & Automation:
Develop and manage Continuous Integration/Continuous Deployment pipelines for smooth and fast software releases. Automate build, test, and deployment processes to minimise errors and downtime.
Containerization & Orchestration:
Lead our containerization efforts and manage container orchestration using Kubernetes (K8s). Ensure efficient deployment, scaling, and management of microservices and AI components in our environment.
Infrastructure as Code (IaC):
Implement and enforce Infrastructure as Code practices using tools like Terraform, Ansible, or Pulumi for consistent and repeatable infrastructure provisioning and configuration.
Monitoring & Observability:
Set up and maintain robust observability systems (monitoring, logging, and alerting) using tools such as Prometheus, Grafana, ELK stack, or Datadog to proactively track system health and performance. Ensure high availability and quick incident response through effective alerting and troubleshooting.
Security & Compliance:
Implement security best practices at every layer of the infrastructure (network, OS, containers, applications) and ensure compliance with industry standards. Conduct regular security audits, vulnerability assessments, and maintain proper access controls and backup/recovery plans.
Collaboration & Reliability:
Work closely with software engineers and data scientists to streamline deployment processes and improve reliability. Advocate for Dev Ops/SRE best practices (e.g., automation, blameless post‑mortems, capacity planning) to increase system resiliency and performance.
Dev Ops Leadership:
As a founding team member, champion a Dev Ops culture of automation, quality,…


Increase/decrease your Search Radius (miles)



Job Posting Language