×
Register Here to Apply for Jobs or Post Jobs. X

Tech Lead Cloud Site Reliability Engineer - DCS Cloud

Job in San Jose, Santa Clara County, California, 95199, USA
Listing for: ByteDance
Full Time position
Listed on 2026-03-08
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Our Infrastructure Engineering team supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services to ensure they are scalable and reliable.

Responsibilities - What You'll Do
  • Design, build, scale, and operate Byte Dance’s global infrastructure, including large-scale systems spanning public and private clouds.
  • Develop tools, automation frameworks, visualizations, and monitoring systems to streamline operations and drive optimization of global infrastructure.
  • Create, manage, and standardize cloud AMIs/images for use across multiple environments, ensuring strict alignment with the company's global compliance standards.
  • Thrive in a fast-paced environment, engaging in technical operations and on-call rotations to address incidents related to cloud, OS, network, performance, and reliability.
  • Drive improvements across the entire infrastructure lifecycle, from ideation and design through development, deployment, user support, and continuous refinement.
Qualifications

Minimum Qualifications
  • Bachelor’s degree or above in Computer Science, Software Engineering, Information Security, or a related field.
  • 5+ years of experience in Linux operations, SRE, or Dev Ops; experience operating large-scale production environments is a strong plus.
  • Proficient in at least one programming language such as Go, Python, or C++, with solid engineering capabilities in platform development, system tooling, and automation.
  • Strong computer science fundamentals, with deep understanding of Linux OS principles, computer networks, storage systems, GPU systems, and databases, along with systematic troubleshooting and root-cause analysis skills.
  • Familiar with core reliability practices, including monitoring and alerting, capacity management, change management, canary/gray releases, incident response, and postmortem processes.
  • Strong communication and collaboration skills, with the ability to proactively identify problems, drive cross-team execution, and demonstrate strong ownership and results-oriented mindset.
Preferred Qualifications
  • Hands‑on experience operating public cloud platforms, or deep familiarity with major cloud providers such as OCI, AWS, Azure, GCP, etc., including understanding of their underlying mechanisms.
  • Experience with large‑scale cloud host delivery, image/AMI systems, resource scheduling, network adaptation, and virtualization technologies such as KVM/QEMU.
  • Familiar with containers and cloud‑native ecosystems, including Docker, Kubernetes, and containerd, with a solid understanding of isolation mechanisms like cgroups and name spaces.
  • Experience maintaining GPU clusters, including drivers, CUDA, MIG, topology awareness, troubleshooting, stress testing, and GPU delivery pipelines.
  • Proven experience in reliability‑focused initiatives such as failure drill systems, capacity governance, change governance, observability platforms, and resource cost optimization.
  • Open‑source contributions, technical blogs, patents, or technical sharing experience are highly preferred.
About Us

Founded in 2012, Byte Dance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including Tik Tok, Lemon8, Cap Cut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, Byte Dance has made it easier and more fun for people to connect with, consume, and create content.

Why

Join Byte Dance

Inspiring creativity is at the core of Byte Dance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life – a mission we work towards every day.

As Byte Dancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary