More jobs:
Site Reliability Engineer
Remote / Online - Candidates ideally in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-01-01
San Francisco, San Francisco County, California, 94199, USA
Listing for:
gamma.app
Remote/Work from Home
position Listed on 2026-01-01
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Millions of people rely on Gamma to create, teach, and persuade, creating more than 1 million gammas every day.
We see Gamma as the next great workplace tool, combining viral B2C love with a massive B2B opportunity. We believe AI can be a true creative partner: one that understands context, clarity, and taste.
We’ve reached a $2.1B valuation, crossed $100M in annual recurring revenue, and have been profitable since 2023.
We're an imaginative, passionate team who takes our work seriously, but not ourselves. Our culture is warm, a little quirky, and fueled by curiosity.
About the role
Gamma's infrastructure needs to be rock-solid for millions of daily users while enabling our engineering teams to ship fast. You'll own the operational health of our full backend platform, building automation and tooling that improves reliability and partnering with engineering to design systems that are observable, resilient, and easy to operate. Your work directly impacts every Gamma user's experience.
This is a high-impact role where you'll balance reliability with velocity, knowing when to move fast and when to prioritize stability. You'll lead incident response, drive systemic improvements, and help shape how Gamma scales to serve its next 100 million users.
Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.
What you'll do
• Own reliability, availability, and performance of Gamma's production systems across primarily AWS infrastructure
• Build observability infrastructure with metrics, logging, tracing, and alerting that provide deep visibility into system health
• Design automation to reduce toil, improve deployment safety, and accelerate incident resolution
• Lead incident response, conduct blameless post-mortems, and drive systemic improvements to prevent recurring issues
• Partner with engineering teams on architecture reviews, SLOs/SLIs, and reliability best practices
• Manage and optimize our infrastructure including compute, networking, databases, and managed services
What you'll bring
• 5+ years in Site Reliability Engineering, Dev Ops, or systems engineering roles with deep AWS expertise
• Strong programming skills (Python, Go, or Type Script/Node.js) for building tools and automation
• Experience with infrastructure-as-code (Terraform, Cloud Formation) and comprehensive observability solutions
• Track record improving system reliability through automation, monitoring, and architectural improvements
• Solid understanding of networking, distributed systems, containerization (Docker, Kubernetes), and database performance
• Strong incident management and debugging skills for complex production issues
• (Nice to have) Experience scaling SaaS applications to millions of users
• (Nice to have) Background with real-time collaborative systems, Kafka, chaos engineering, or service mesh technologies
• (Nice to have) AWS certifications or experience with security/compliance requirements (SOC 2, ISO
Compensation range
Final offer amounts are determined by multiple factors, including but not limited to experience and expertise in the requirements listed above.
If you're interested in this role but you don't meet every requirement, we encourage you to apply anyway! We're always excited about meeting great people.
We're building on a full Typescript stack centered around some of the most modern and popular technologies.
We use our own custom, open-source AI prompting framework, AIJSX. We have a lot of custom tools built in-house, but also new ones like Vercel AI SDK.
Our tiny team operates at massive scale:
1M+
70M users around the world
6M+ AI images generated daily
1 trillion LLM tokens processed per month
Life at Gamma
You get energy from small teams doing big things.
You love when design, code, and storytelling overlap.
You default to action, even when…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×