×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Cambridge, Middlesex County, Massachusetts, 02140, USA
Listing for: Blitzy Inc.
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 160000 - 180000 USD Yearly USD 160000.00 180000.00 YEAR
Job Description & How to Apply Below

About Blitzy

Blitzy is a Cambridge, MA based AI software development platform on a mission to revolutionize the software development life cycle by autonomously building custom software to unlock the next industrial revolution. We're transforming how enterprises build software, turning enterprise requirements into production-ready code with an agentic software development platform that can autonomously execute 80% of the quantum of software development work.

We're backed by multiple tier 1 investors, and have proven success as founders of previous start-ups.

Location: Cambridge, MA (In-Office)

Compensation: $160,000 - $180,000 + equity eligibility based on performance

The Role

As a Senior Site Reliability Engineer at Blitzy's Pune headquarters, you will be the backbone of our platform's reliability, scalability, and operational excellence. You'll work at the intersection of software engineering and infrastructure, ensuring our AI-powered development platform remains highly available and performant as we scale rapidly. This is a high-impact, hands‑on role for an engineer who thrives in a fast‑moving environment and takes deep ownership of the systems they build.

What

Success Looks Like
  • In 30 days:
    You have a deep understanding of Blitzy's infrastructure architecture, have identified key reliability risks, and are actively contributing to on‑call rotations.
  • In 90 days:
    You have shipped meaningful improvements to observability, incident response workflows, and deployment pipelines that measurably reduce MTTR and increase system uptime.
  • In 6 months:
    You have driven at least one major reliability initiative from inception to production, established SLO/SLA frameworks for critical services, and are a trusted technical voice shaping our infrastructure roadmap.
Areas of Ownership
  • Design, build, and operate scalable, fault‑tolerant infrastructure across cloud environments (AWS, GCP, or Azure).
  • Define and enforce SLOs, SLAs, and error budgets; lead blameless postmortems and drive systemic improvements.
  • Build and maintain robust CI/CD pipelines, release automation, and deployment infrastructure.
  • Own observability: design and maintain logging, metrics, tracing, and alerting stacks (e.g., Prometheus, Grafana, Datadog, Open Telemetry).
  • Partner closely with software engineering teams to embed reliability practices into the development lifecycle.
  • Drive capacity planning, performance benchmarking, and cost optimization across our infrastructure.
  • Champion security best practices within the infrastructure and deployment layers.
Required Experience
  • 5+ years of experience in Site Reliability Engineering, Dev Ops, or Infrastructure Engineering roles.
  • Strong proficiency in at least one major cloud platform (AWS preferred); experience with Kubernetes and container orchestration at scale.
  • Hands‑on experience with infrastructure‑as‑code tools (Terraform, Pulumi, or equivalent).
  • Proven track record designing and maintaining high‑availability, distributed systems.
  • Deep expertise in observability tooling, incident management, and on‑call practices.
  • Strong scripting and automation skills (Python, Go, Bash, or similar).
  • Excellent communication skills with the ability to collaborate across engineering teams and present technical findings to leadership.
What Makes You Stand Out
  • Experience supporting AI/ML workloads or GPU‑accelerated infrastructure.
  • Prior experience in a high‑growth startup environment where you wore multiple hats.
  • Familiarity with eBPF, service mesh technologies (Istio, Linkerd), or advanced networking.
  • Contributions to open‑source SRE/Dev Ops tooling or communities.
  • Experience building global, multi‑region infrastructure with strict latency and availability requirements.
What Makes This Role Different

You won't be maintaining legacy systems or fighting fires in a sprawling monolith. At Blitzy, you're building reliability into a greenfield AI platform that is redefining how the world creates software. You'll have direct influence over architectural decisions, work side‑by‑side with world‑class engineers, and see the tangible impact of your work as we scale to serve Fortune 500 customers. As a founding member of the Pune SRE team,…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary