SR.SRE; Linux & Windows
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, Cloud Computing
Overview
The Systems Reliability Engineering (SRE) team helps elevate SRE practices at TWDC, promoting and on-boarding new technologies, solving complex problems and integrating with next generation digital platforms.
Systems Reliability Engineers use a software engineering approach to architect, design, automate, monitor, and build applications s includes operating and engineering software with close business segment alignment to deliver platforms through efficient, effective and resilient architectures. SREs are talented engineers that are focused on improving quality through a data driven approach: instrumentation, automation, and functional/unit testing.
The Senior SRE will help create, build and deliver amazing experiences for our guests, fans and businesses. Primary responsibilities include helping existing, new and emerging business teams onboard technologies or platforms to accelerate their businesses. This will include consultation, designing, building, and supporting development pipelines, automating infrastructure and operations, creating telemetry for monitoring, engineering high reliability and reinforcing best practices to secure our company and guest data.
The Senior SRE is expected to have systems administration skills in Linux and Windows platforms, and must have experience with software development (e.g. Python, Go, Java, Node), CI Pipeline tools (e.g. Jenkins), Git source management, cloud hosting (AWS, GCP & Azure), container computing (e.g. Docker, OCI), web technologies and the Dev Ops team culture. This position will also bring a working knowledge on systems, network, operational excellence and application stability, security, performance, and capacity management, as well as documentation.
The Senior SRE must be prepared to work with engineering, creative and production teams in an extremely collaborative and high-energy environment to brainstorm, architect, gather requirements, troubleshoot, and provide stellar customer support. The ideal Senior SRE is passionate about constantly learning, applying technology to solve complex problems, and is a highly motivated, optimistic, proactive, creative thought leader and project manager.
The Senior SRE will- Translate ideas into tangible products that shape experiences by focusing on a systematic approach to automation, resiliency, efficiency, stability, security, performance, and capacity management, as well as documentation and serve as a subject matter expert through internal and external tech talks and conferences.
- Support initial discovery, architecture, design, automation, implementation and operationalization, including:
- Business Engagement and Requirements Gathering
- Architectural Review, Proof of Concept Work, and Onboarding
- Project:
Build and Operationalize New Systems/Sites/Services/Products - Systematic Load Testing, Troubleshooting, Optimization and Tuning
- Create System and Application Monitors, Trending Metrics and Reports
- Development:
Tools and Automation Frameworks - Hosting Platforms and Infrastructure Design and Support
- Documentation:
Creation of Application Infrastructure Design documents, Operational Runbooks, and Knowledge Base Articles
- Fluent in multiple scripting languages and advanced skills in programming languages (e.g. Go, Python, Ruby, Dart, Node, Java, others alike) with ability to build test coverage for all software being developed.
- Systems administration skills on Linux and Windows platforms
- Networking skills and protocols (e.g. HTTP, TLS, SSH, DNS)
- Software Development Continuous Integration (CI) Pipeline knowledge (e.g. Jenkins, Gitlab CI)
- Experience with Distributed Systems and Container Platforms (e.g. Kubernetes/GKE, ECS, Mesos, Fargate, Nomad)
- Experience with Source Control Management systems (e.g. Git)
- Expertise in public and private cloud hosting services (AWS, Google Cloud, Azure)
- Recognized as an expert on at least one OS and proficient in multiple operating systems, including OS performance monitoring, setup, configuration, tuning, and troubleshooting.
- Proficient in web server technologies (e.g. Apache, Node.js, NginX, Tomcat, IIS, Caddy Server) including setup, configuration, performance monitoring, tuning, clustering, and debugging (e.g. JConsole).
- Proficient with data technologies (e.g. No
SQL, MySQL, Mongo
DB, Redis, Elastic) including being able to perform basic setup, configuration, and troubleshooting.
Able to implement existing base standards for new systems and/or applications for all of the following:
- Site/Systems monitoring and instrumentation
- Application monitoring and instrumentation
- System monitoring and instrumentation
- Resilience, performance & Telemetry data
- Able to diagnose simple to complex system and process problems.
- Demonstrate exceptional troubleshooting methodology, including the ability to author and instruct new methodologies to the SRE team.
- Independently resolve moderately to highly complex system and application incidents.
- Able to identify and propose system and application…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).