Site Reliability Engineer Job San Francisco California USA,IT/Tech

We're building the creative layer for modern communication. Every month, over a billion people make presentations — but the tools they use to make them haven't evolved in decades. We're changing that, using AI to disrupt a massive market.

Millions of people rely on Gamma to create, teach, and persuade, creating more than 1 million gammas every day.

We see Gamma as the next great workplace tool, combining viral B2C love with a massive B2B opportunity. We believe AI can be a true creative partner: one that understands context, clarity, and taste.

We’ve reached a $2.1B valuation, crossed $100M in annual recurring revenue, and have been profitable since 2023.

We're an imaginative, passionate team who takes our work seriously, but not ourselves. Our culture is warm, a little quirky, and fueled by curiosity.

About the role

Gamma's infrastructure needs to be rock-solid for millions of daily users while enabling our engineering teams to ship fast. You'll own the operational health of our full backend platform, building automation and tooling that improves reliability and partnering with engineering to design systems that are observable, resilient, and easy to operate. Your work directly impacts every Gamma user's experience.

This is a high-impact role where you'll balance reliability with velocity, knowing when to move fast and when to prioritize stability. You'll lead incident response, drive systemic improvements, and help shape how Gamma scales to serve its next 100 million users.

Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.

What you'll do

• Own reliability, availability, and performance of Gamma's production systems across primarily AWS infrastructure

• Build observability infrastructure with metrics, logging, tracing, and alerting that provide deep visibility into system health

• Design automation to reduce toil, improve deployment safety, and accelerate incident resolution

• Lead incident response, conduct blameless post-mortems, and drive systemic improvements to prevent recurring issues

• Partner with engineering teams on architecture reviews, SLOs/SLIs, and reliability best practices

• Manage and optimize our infrastructure including compute, networking, databases, and managed services

What you'll bring

• 5+ years in Site Reliability Engineering, Dev Ops, or systems engineering roles with deep AWS expertise

• Strong programming skills (Python, Go, or Type Script/Node.js) for building tools and automation

• Experience with infrastructure-as-code (Terraform, Cloud Formation) and comprehensive observability solutions

• Track record improving system reliability through automation, monitoring, and architectural improvements

• Solid understanding of networking, distributed systems, containerization (Docker, Kubernetes), and database performance

• Strong incident management and debugging skills for complex production issues
• (Nice to have) Experience scaling SaaS applications to millions of users
• (Nice to have) Background with real-time collaborative systems, Kafka, chaos engineering, or service mesh technologies
• (Nice to have) AWS certifications or experience with security/compliance requirements (SOC 2, ISO

Compensation range

Final offer amounts are determined by multiple factors, including but not limited to experience and expertise in the requirements listed above.

If you're interested in this role but you don't meet every requirement, we encourage you to apply anyway! We're always excited about meeting great people.

We're building on a full Typescript stack centered around some of the most modern and popular technologies.

We use our own custom, open-source AI prompting framework, AIJSX. We have a lot of custom tools built in-house, but also new ones like Vercel AI SDK.

Our tiny team operates at massive scale:

1M+

70M users around the world

6M+ AI images generated daily

1 trillion LLM tokens processed per month

Life at Gamma

You get energy from small teams doing big things.

You love when design, code, and storytelling overlap.

You default to action, even when…


Increase/decrease your Search Radius (miles)



Job Posting Language