Site Reliability Engineer, Compute - USDS
Listed on 2026-05-17
-
IT/Tech
Systems Engineer, Cloud Computing
Site Reliability Engineering (SRE) at Tik Tok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.
Responsibilities- Develop and maintain automation procedures to maximize system efficiency and minimize human intervention.
- Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust.
- Ensure system scalability to handle growth in web traffic and data.
- Implement monitoring tools and set up metrics to keep track of system health and performance.
- Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues.
- Conduct performance tests to find and address system bottlenecks.
- Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
- Practice sustainable user support, incident response, and blameless postmortems.
Minimum Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field with 3+ years of experience
- Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role.
- Passionate about operational excellence through methodical automation and engineering processes using programming languages such as Go, Python and/or any other languages.
- Experience in network architecture, database modeling, cloud systems and large-scale distributed systems.
- Strong understanding of Linux operating systems and open-source technologies.
- Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems.
- Exceptional communication skills and the ability to effectively collaborate with cross-functional teams.
- Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana).
- Experience with containers and container orchestration platforms such as Docker, Kubernetes or equivalent.
Tik Tok USDS Joint Venture LLC is dedicated to the safety and security of millions of Americans who create, discover, and connect with what they love on the apps we operate. The Joint Venture has been established in compliance with the Executive Order signed by President Trump on September 25, 2025. Our foundation is a comprehensive data privacy and cybersecurity program we operate under defined safeguards to protect national security and secure U.S. user data, apps and the algorithm.
We safeguard the U.S. content ecosystem, holding decision-making authority for trust and safety policies and moderation. USDS Joint Venture helps ensure Americans can continue to express their creativity, discover new hobbies and interests, and build thriving communities and businesses on a global scale.
On-site presence across teams allows the company to operate with greater speed, alignment, and agility — especially in areas like real-time decision-making, team development, and integrated execution. As such, the company is shifting from a hybrid work model to a fully in-person schedule up to 5 days a week.
Why Join UsInspiring creativity is at the core of Tik Tok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and bring joy - a mission we work towards every day.
We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We're resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).