×
Register Here to Apply for Jobs or Post Jobs. X

Infra Lead

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Dormont Manufacturing Co
Full Time position
Listed on 2026-07-03
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing: Infrastructure & Operations
Salary/Wage Range or Industry Benchmark: 180000 - 260000 USD Yearly USD 180000.00 260000.00 YEAR
Job Description & How to Apply Below

About us

Our mission is to reinvent the way people learn, starting with language. We begin by teaching the next billion people English, Spanish, and French.

English is the global language of business, culture, and communication, and over 1.5 billion people around the world are actively trying to learn right now. Others dream of communicating with the half-billion native Spanish speakers across the globe. The problem is that it’s nearly impossible to learn to speak a language without constant access to a speaking partner. Grammar and vocab apps don’t really help – you need to actually converse with someone.

Speak is on a journey to fix this. We’re creating an AI-powered experience that replicates the flow of a conversation, without needing a human on the other end. The goal is to make it radically more accessible to be able to have conversations in a foreign language and eventually help hundreds of millions of people gain fluency who otherwise wouldn’t be able to.

We started on this journey over five years ago and we’ve still got a long ways to go. We’re thoughtfully adding new team members only when we think they can truly play a big role in our mission.

Speak launched first in South Korea where we have quickly grown to become the top grossing education app in the country. We have now delivered this winning product to more than 40 countries globally and are continuing to expand to more markets in the coming months. The company is well funded, and as of December 2024, we’ve reached a $1B valuation with our Series C round, through key partners like Accel, OpenAI, Founders Fund, Y Combinator, Khosla Ventures, Lachy Groom, Josh Buckley, and more.

We’re a team of more than 90 based throughout San Francisco, Seoul, Tokyo, Taipei, and Ljubljana.

About this role

As an SRE Engineer, Lead at Speak , you’ll be the driving force behind the reliability and resilience of the systems that power our global language learning experience. You’ll lead efforts to scale our infrastructure, harden our platform, and ensure that our services are fast, available, and reliable for millions of users around the world.

You’ll work across our stack—from Kubernetes on GCP to our Node.js APIs, Postgres, and Redis —building robust infrastructure and operational tooling. You’ll own incident response, observability, and SLOs while embedding a culture of reliability throughout the engineering org.

Speak is growing rapidly, and we’re pushing our systems harder every day. This is a unique opportunity to shape the future of our platform as we scale to the next 10x of users.

What you’ll be doing

  • Own the reliability of Speak’s infrastructure across GCP, Kubernetes, and our Node.js/Postgres stack
  • Lead response for P0/P1 incidents, drive postmortems, and ensure we’re learning from every outage
  • Improve observability, alerting, and on‑call processes so we catch issues before users do
  • Define and drive adoption of SLOs/SLAs for core systems and services
  • Build tools and frameworks to make reliability easier for product engineers—think safer deploys and infrastructure automation
  • Collaborate cross‑functionally with Product, Engineering, and ML teams to ensure reliability is baked into everything we build
  • Set short term and long term roadmaps to ensure stability for our growing userbase.
  • Be a thought leader and coach around SRE principles—blameless culture, operational maturity, and continuous improvement

What we’re looking for

  • 7+ years of experience in SRE, Dev Ops, or infrastructure-focused engineering roles, ideally with experience leading or mentoring others
  • Strong experience with GCP ,
    Kubernetes ,
    Terraform ,
    Node.js ,
    Python ,
    PostgreSQL ,
    Redis , and observability tooling like Prometheus and Sentry
  • Proven track record of improving reliability, scaling systems, and reducing incident frequency and severity with high traffic systems
  • Strong incident management and root cause analysis skills—you know how to lead under pressure
  • Experience building and maintaining CI/CD pipelines and deployment safety tooling
  • Strong systems thinking, with the ability to identify failure points and proactively harden services
  • Deep sense of ownership and a desire to make…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary