×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer, Supply

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Mithril
Full Time position
Listed on 2026-01-03
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 170000 - 230000 USD Yearly USD 170000.00 230000.00 YEAR
Job Description & How to Apply Below

Senior Site Reliability Engineer, Supply

As a key member of the Supply Engineering team, you will enable the sustainable, reliable growth of Mithril’s compute supply, overseeing technical operations and managing compute partner relationships.

Responsibilities
  • Design, deploy, and manage scalable, secure, and highly available Kubernetes clusters in cloud and on‑premises environments.
  • Execute and develop Ansible playbooks for routine maintenance, load testing, and system burn‑in across the Mithril fleet.
  • Deploy and oversee monitoring systems such as Grafana to proactively detect issues and anomalies.
  • Establish and uphold service level objectives (SLOs) and service level indicators (SLIs) to gauge system reliability.
  • Lead or participate in incident response and root‑cause analysis.
  • Provide regular updates on machine operability and notify partners of disruptions to maintain availability and confidence.
  • Act as the primary liaison with suppliers, maintaining regular meetings to communicate requirements and address inquiries.
  • Coordinate cross‑functional supply‑related initiatives, ensuring stakeholders are aligned for upcoming changes or maintenance events.
Requirements
  • Proven experience deploying, scaling, and maintaining production‑grade Kubernetes clusters across cloud or on‑prem environments.
  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience.
  • Experience with Linux system administration and command‑line interfaces.
  • Ability to create technical documentation and specifications.
  • Proficiency in scripting and automation (Python, Bash, or similar).
  • Understanding of key infrastructure metrics (CPU, memory, network utilization, error rates).
  • Knowledge of data center operations: disaster recovery, maintenance schedules, capacity planning.
  • Strong written and verbal communication skills, able to translate technical concepts.
  • Project management experience and ability to handle multiple priorities.
  • Demonstrated problem‑solving and analytical thinking skills.
  • Experience leading or participating in incident response and root‑cause analysis.
Nice to Have
  • Familiarity with GPU/CPU cluster management and optimization.
  • Proficiency with Git or similar version control.
  • Experience with Prometheus or Grafana monitoring and observability tools.
  • Experience in technical training or presenting content.
  • Prior SRE experience in the AI/ML domain.
  • Experience at scale infrastructure and hardware lifecycle management (RMA).
  • Experience in vendor‑facing roles.
  • Health, dental, and vision coverage for you and dependents.
  • 401k plan with 4% company match.
  • 21 days PTO & 14 company holidays; including 2 floating holidays.
Salary Range Information

Remuneration bracket: $170,000‑$230,000, with possible adjustments for outstanding qualifications.

In‑Office Requirement

Primary work location is Palo Alto or San Francisco, with weekly on‑site collaboration. Flexible arrangements are possible for extenuating circumstances.

Equal Opportunity Employer

Mithril maintains a strict commitment to equal opportunity employment practices. All applicants are evaluated without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation, disability, veteran status, citizenship, or any other protected class.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary