×
Register Here to Apply for Jobs or Post Jobs. X

Senior Production System Engineer - Ashburn

Job in Ashburn, Loudoun County, Virginia, 22011, USA
Listing for: ByteDance
Full Time position
Listed on 2026-02-09
Job specializations:
  • IT/Tech
    Systems Engineer, Data Engineer, Cloud Computing, Systems Administrator
Job Description & How to Apply Below

Overview

The Data Systems Infrastructure (DSI) team stands as the unseen architects behind the scenes. In a thrilling dance of technology and innovation, we propel the company's meteoric rise by constructing and orchestrating colossal data fortresses, taming the life cycle of server fleets, conjuring Cloud solutions, and crafting a symphony of infrastructure services. Our mission is to ensure scalability and unwavering reliability, making sure Byte Dance's digital footprint leaves an indelible mark on the world.

Embark on an exciting expedition to explore the rapidly expanding Byte Dance domain in the United States, Europe, and Asia. Here, the Data Systems Infrastructure (DSI) team is crafting monumental data citadels that encircle the planet, sheltering legions of hundreds of thousands of servers. As the maestro of our production systems, you will embark on a captivating odyssey, taming the life cycles of these servers.

Your adventure will begin with the orchestration of their initial deployment, navigating the intricate terrain of OS installation, summoning services like a digital magician, and maintaining vigilant watch over our inventory. You will also troubleshoot and restore in challenging moments, and you will guide servers into retirement and recycling, contributing to Byte Dance's technological evolution.

Responsibilities
  • Operation:
    As a Production Systems Engineer, contribute to enhancing the stability, efficiency, effectiveness, and scalability of our data center and server operations, platform, and service on a worldwide scale.
  • Lifecycle Enhancement:
    Participate in and enhance the entire lifecycle of the server fleet—from design/introduction consultation to launch reviews, deployment, operation, and retirement.
  • Automation:
    Develop and deploy tools and solutions to enhance automation, reliability, scalability, and operability of servers in the datacenter.
  • Monitoring:
    Develop and deploy tools and solutions for improving availability, latency, and overall service of the datacenter infrastructure, server, and network health.
  • Disaster Recovery:
    Troubleshoot and resolve complex technical issues in a high-pressure, fast-paced environment. Conduct high-level root-cause analysis for service interruption and establish preventive measures. Practice sustainable incident response and postmortem.
  • Cross-team

    Collaboration:

    Collaborate with infrastructure architects, project managers, data center operations engineers, platform developers, supply chain teams, and internal customers to comprehend overarching business objectives. Design and implement innovative solutions for Core IDCs and CDN/Edge.
  • On-call:
    Engage in on-call support spanning across regions and incident response teams to address critical issues in the production environment.
Qualifications

Minimum Qualifications:

  • Education:

    Bachelor's degree in Computer Science, Electronic Engineering, relevant technical field, or equivalent practical experience.
  • Experience in at least one of the areas below:
  • Server Operations:
    Proficiency in Linux system administration tasks, understanding of Linux kernels, drivers, and modules. Scripting in Bash and Python to automate routine system operations, including system configuration, performance tuning, and security management within Linux. Understanding of server hardware and ability to troubleshoot or diagnose. Experience participating in the planning, delivery, and operation of large-scale data centers in different countries.
  • Tooling Adaptation, Deployment, and Maintenance:
    Proficiency in customizing operation and maintenance tools for new server hardware. Manage the entire software tool lifecycle from deployment to continuous maintenance, including monitoring, provisioning, fault management, and repairs to ensure smooth operation of new server hardware. Experience developing and maintaining hardware, network, or service monitoring software for more than 10,000 servers.
  • Communication:
    Experience in managing and coordinating teams in a global context.

Preferred Qualifications:

  • 5 years of work experience in related field.
  • Data Center:
    Proficiency in OS installations, break-fix operations, and projects spanning planning and…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary