Senior Site Reliability Engineer
Job in
Austin, Travis County, Texas, 78716, USA
Listed on 2026-06-02
Listing for:
Zello Inc
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Job Description & How to Apply Below
Please be aware, scammers may try to impersonate Zello by reaching out regarding job opportunities. We will never ask you for bank account information, checks, or other sensitive information as part of our hiring process. All correspondence will come from the email domain. If you're unsure, please email with questions.
About Zello
Zello is a voice-first communication platform, powered by our industry-leading push-to-talk technology, to improve collaboration and productivity for desk-less workers. With over 175+ million users, we're the #1 rated push-to-talk app in the world, delivering 9 billion (yes, with a
B) messages a month.
At Zello, our company values are at the heart of what we do everyday. We're proud to serve the frontline, we're privileged to connect people in times of crisis across the globe, and we're honored to support first responders.
And this is where you come in.
We're seeking a Senior Site Reliability Engineer who can own our data tier at high availability while also pulling weight across the broader platform. As Zello scales, the line between "database problem" and "platform problem" keeps blurring. We want someone who can sit on either side of it. This hire owns our data tier reliability (MySQL, Mongo
DB, Scylla
DB, Elasticsearch, Redis) and contributes to monitoring, on-call, and our ongoing cloud modernization efforts.
About Zello
Zello is the leading push-to-talk communication platform, enabling instant voice communication for frontline workers across hospitality, logistics, transportation, construction, and public safety. When a hotel manager radios housekeeping or a trucker calls dispatch, they're on Zello - and they need it to work every time. The Platform team builds and operates the infrastructure that makes that possible. Databases sit at the center of that promise: every channel, every message, every login depends on them.
The Role
You'll join the Platform team and report to the Director of Platform Engineering. You'll own the reliability of our MySQL and Mongo
DB footprint across Google Cloud, work alongside application engineers on performance and schema decisions, and contribute to the broader platform, observability with Prometheus, Loki, and Tempo; on-call; incident response;. This role suits someone who likes operating real production systems, doesn't get stage fright in incidents, and writes the runbook for the next person who hits the same problem.
We're investing in AI to compress incident response, build agents and tooling that speed up root-cause analysis, and lift developer productivity across engineering. We want someone curious about what that looks like for an SRE and excited to help shape it.
After a Successful First Year, You Will Have:
- Operated Zello's MySQL and Mongo
DB clusters to documented availability targets, with automated backups, regularly tested restores, and failover the on-call team trusts under real incident pressure. - Cut latency or capacity cost on at least one critical database workload through measurable performance work - index strategy, query tuning, schema changes, or sharding.
- Extended our Observability coverage so incidents are diagnosed in minutes rather than hours, with dashboards and alerts the team actually uses.
- Owned a slice of the Platform on-call rotation and led postmortems that turned recurring incidents into permanent fixes.
- Design, deploy, and operate highly available MySQL and Mongo
DB clusters across our cloud environments; replication, sharding, backups, point-in-time recovery, upgrades, and disaster recovery. - Tune query performance, schema, and index strategy in partnership with application engineers and push fixes upstream into the application when that's the right answer.
- Extend our observability stack - Prometheus, Loki, and Tempo - so the data tier is as well instrumented as the application tier, and traces actually reach the root cause.
- Participate in the Platform on-call rotation, lead incident response for data-tier issues, and write postmortems that drive durable change.
- Improve disaster recovery, security posture, and compliance for our database footprint - encryption, access control, audit…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×