Site Reliability Engineer - AI Application Technology - Backend Regular
Listed on 2026-05-25
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability
Site Reliability Engineer - AI Application
Location:
Team:
Employment Type:
Regular
Job Code:
A249701A
Responsibilities- Ensure the reliability and normal operation of multiple core systems related to Viking Team's Big data and online services, focusing on system capacity planning and stability assurance.
- Enhance system visibility by monitoring the availability and performance metrics of system components, helping development teams quickly locate faults, and especially ensuring stability in critical links such as AI search/vector databases.
- Improve the reliability, scalability, and performance optimization of services to ensure core system SLA achievement.
- Participate in the design and implementation of the automation platform, ensuring rapid iteration and efficient operation and maintenance of large-scale online Viking clusters and AI search-related clusters.
- Combine usage scenarios of AI Search/Viking business, in-depth optimization of service governance practices, including but not limited to analysis of performance bottlenecks in key AI Search/Viking links, business problem location and troubleshooting, promoting transformation and upgrading of the system's high-availability architecture; familiarity with Viking-related technologies preferred for core optimization work.
- Bachelor's degree or above, majoring in computer-related fields, with more than five years of relevant work experience.
- Solid foundation in computer software knowledge, understanding relevant principles of Linux OS, storage, network IO, etc.
- Familiar with at least one programming language (Python, Go, Java, Shell, Ansible) with moderate development capabilities, emphasis on operations and maintenance practices and problem-solving abilities.
- Ability to solve problems systematically, good communication skills, sense of ownership, capable of handling cross-team collaboration scenarios.
- Understanding at least one type of cloud infrastructure such as AWS/Volcano Engine/Aliyun/GCP; experience in computing/distributed systems preferred (e.g., Nginx, Kubernetes, Docker, Open Stack, Hadoop, Spark, Flink, etc.).
- Priority given to candidates with algorithmic thinking, good data structure and system design capabilities, and a certain understanding of AI Cloud, large model‑related Search Suggestion, and Recommender system.
Founded in 2012, Byte Dance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including Tik Tok, Lemon8, Cap Cut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, Byte Dance has made it easier and more fun for people to connect with, consume, and create content.
WhyJoin Byte Dance
Inspiring creativity is at the core of Byte Dance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.
As Byte Dancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity& Inclusion
Byte Dance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At Byte Dance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.
We are passionate about this and hope you are too.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).