More jobs:
Lead Site Reliability Engineer, Factory Software
Job in
Fremont, Alameda County, California, 94537, USA
Listed on 2026-06-06
Listing for:
Tesla
Full Time
position Listed on 2026-06-06
Job specializations:
-
IT/Tech
Systems Engineer, IT Support
Job Description & How to Apply Below
The Factory Software team at Tesla is building critical applications to enable manufacturing and warehouse management with a strong emphasis on reliability, availability, scalability, speed, and security. We are a diverse, cross-functional team of Controls Engineers, Software Engineers, SREs, and other disciplines working on automated manufacturing and warehouse processes.
This is a technical leadership role. As the Lead Site Reliability Engineer, you will be the primary technical owner and leader for the Factory Software team's reliability, observability, and infrastructure strategy. You will combine deep hands-on engineering with leadership to set technical direction, raise the bar on engineering practices, and ensure the full stack - from Kubernetes clusters and databases to factory-facing applications - is highly reliable, observable, and performant.
What You'll Do
* Provide technical leadership and set the vision for observability, reliability, and platform standardization across the Factory Software team
* Design and implement end-to-end observability and telemetry solutions (OTEL, Prometheus, Grafana, Tempo, etc.) while mentoring the team on best practices
* Own the reliability of the full stack:
Kubernetes infrastructure, virtual machines, databases, and the middleware applications connecting PLCs, MES systems, and other factory services
* Define and drive SLIs, SLOs, error budgets, and golden signals across services
* Lead major initiatives to eliminate speed bottlenecks, database contention, and infrastructure issues through proactive monitoring and automation
* Write production-grade code and build tools to reduce toil and improve deployment, monitoring, and operational workflows
* Participate hands-on in on-call rotations, live troubleshooting during outages (NOC bridges), and blameless post-mortems
* Collaborate closely with Platform Engineering, Infrastructure, Controls Engineering, and Software Engineering teams to embed reliability and observability into architecture and development practices
* Mentor and coach engineers on technical excellence, observability, Kubernetes, Linux, networking, and reliable system design
* Drive continuous improvement in incident response, system performance, and engineering standards across the team
What You'll Bring
* 7+ years of experience in Site Reliability Engineering, Platform Engineering, or related systems roles, with significant hands-on experience at scale
* Strong technical expertise in Kubernetes, Docker, Linux administration, and networking (routing, VLANs, firewalls, load balancers)
* Deep experience with observability tools and concepts (Prometheus, Grafana, Tempo, OTEL, Splunk, etc.)
* Proven track record of designing and implementing reliable, observable distributed systems
* Proficiency in at least one high-level language (Go, Python, or Java) with experience writing production-grade code
* Demonstrated ability to lead technical initiatives and raise the engineering bar without formal people management authority
* Experience with on-call rotations, incident command, and driving reliability improvements through blameless post-mortems
* Strong bias for action, excellent communication skills, and a desire to mentor and uplift other engineers
* Experience in manufacturing, industrial automation, or complex operational environments is a strong plus
Compensation and Benefits
Benefits
Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:
* Medical plans plan options with $0 payroll deduction
* Family-building, fertility, adoption and surrogacy benefits
* Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
* Company Paid (Health Savings Accounts) HSA Contribution when enrolled in the High-Deductible medical plan with HSA
* Healthcare and Dependent Care Flexible Spending Accounts (FSA)
* 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
* Company paid Basic Life, AD&D
* Short-term and long-term disability insurance (90 day waiting period)
* Employee Assistance Program
* Sick and Vacation time…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×