×
Register Here to Apply for Jobs or Post Jobs. X

LLM AIOps Development Engineer - Data Center Networking

Job in Seattle, King County, Washington, 98127, USA
Listing for: ByteDance
Full Time position
Listed on 2026-02-18
Job specializations:
  • Software Development
    Data Engineer, Software Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Location:

Seattle

Team:
Technology

Employment Type:

Regular

Job Code: A232132

Share this listing:

Responsibilities

About the team Networking brings together innovative ideas and technologies from network architecture, software defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed networking, to create hyper-scale data-center networking solutions that power several of the most popular apps of the world such as Douyin and Tik Tok which serve hundreds of millions of users around the globe.

Network Observation team is committed to building a world-leading hyperscale data center network infrastructure that supports hundreds of millions of users  real-time access and explosive growth of massive data volumes. We believe that the next generation of network operations will be fundamentally powered by artificial intelligence technologies, particularly Large Language Models (LLMs).We are seeking a passionate development engineer who combines deep networking expertise with innovative AIOps capabilities to join us in defining and building "autonomous" data center networks.

Together, we will transform network operations from a reactive "firefighting" mode into a proactive, data-driven intelligent ecosystem with predictive and self-healing capabilities.

  • Build a Panoramic Network Observability Platform:
    Develop a streaming telemetry data pipeline for both physical and virtual networks, integrating multi-source data from gNMI, Netconf, IPFIX/Net Flow, and SNMP to provide a high-quality, real-time data foundation for AIOps.
  • Develop an Intelligent Diagnostics and Root Cause Analysis System:
    Apply machine learning and deep learning algorithms to perform anomaly detection, correlation analysis, and intelligent noise reduction on massive volumes of network metrics, logs, and events. Swiftly pinpoint root causes of failures across the entire stack, from optical transceivers and switch hardware to protocol adjacencies and application traffic.
  • Explore Innovative Applications of LLMs and Agents:
  • Intelligent Operations Assistant:
    Build a conversational chatbot powered by Retrieval-Augmented Generation (RAG) that understands natural language queries, automatically queries knowledge bases and monitoring data, and provides precise troubleshooting guidance and network status reports.
  • Automated Remediation and Smart Runbooks:
    Train operational Agents to safely and controllably invoke network change tools and APIs. Empower them to autonomously generate, recommend, or even execute remediation plans and emergency runbooks based on their understanding of failure scenarios.
  • Establish Capacity and Risk Prediction Capabilities:
    Forecast network capacity bottlenecks, high-risk links, and "sub-healthy" devices based on historical data and business growth models, enabling proactive scaling and preventative maintenance.
  • Forge a Rock-Solid Engineering System:
    Adhere to engineering best practices to design and develop a highly available and scalable AIOps platform. Guarantee the stability and performance of the entire pipeline, from data collection and model training to online inference and automated closed-loop actions.
Qualifications

Minimum Qualifications:

  • Solid Fundamentals in Computer Science and Networking: A deep understanding of data center network architectures (e.g., Spine-Leaf Fabric), and proficiency in key protocols such as EVPN/VXLAN and BGP/OSPF. In-depth knowledge of the Linux network stack is essential.
  • Excellent Software Engineering

    Skills:

    Mastery of Golang or Python with outstanding coding and system design abilities. Familiarity with modern software development workflows, including microservices, containerization (Docker/Kubernetes), and CI/CD.
  • Rich Platform Development

    Experience:

    Practical experience in one or more of the following areas is highly desirable:
  • Big Data Processing:
    Familiarity with Kafka, Flink, Click House/TSDB, and experience building real-time data pipelines and analytics systems.
  • Observability Technologies:
    Experience with Prometheus/Open Telemetry, graph databases (e.g., Neo4j), and developing alert and event platforms.
  • A Passion for AIOps/ML/LLM Practices: A keen…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary