×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Frisco, Collin County, Texas, 75034, USA
Listing for: McAfee
Full Time position
Listed on 2026-01-27
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below

Role Overview

As the SRE engineer, you will be accountable and responsible to maintain the appropriate service levels (availability, latency, and reliability) to serve our customers  needs, and reduce the friction for managing change. Your responsibilities include engaging with Dev Ops, Engineering and other teams to understand and support the business needs and initiatives. Every SRE is responsible for the availability, scalability, security, performance, cost, and compliance requirements of our services.

You will ensure applications onboarded to SRE are instrumented for full-stack observability and continuous testing, introduce continuous improvement, integrate into IT Service Operations, and share support responsibilities for critical customer journeys, business flows, and applications.

This is a Hybrid position located in Frisco, TX. You will be required to be onsite on an as-needed basis, typically 1 to 6 times a month. We are only considering candidates within a commutable distance to one of the two locations and are not offering relocation assistance at this time.

About

The Role
  • Responsible for proactive monitoring of mission critical production environment and respond quickly in response to breach in trends or issues.
  • Troubleshoot, debug, and escalate issues with proper analysis to concerned teams to ensure maximum availability.
  • Troubleshoot problems in real-time, interacting with Dev Ops/Engineering and internal support representatives to deliver maximum customer satisfaction.
  • Detect and triage of all operational incidents and requests.
  • Work extensively to help reduce the Mean Time to Restore (MTTR) and improve Mean Time To Detect (MTTD).
  • Work across Engineering and Support teams to ensure we meet our goals for service reliability, availability, and efficiency.
  • Ensure security events and alerts are addressed in a timely manner.
  • Own availability and performance of mission critical services. Automate to prevent problem recurrence, and respond to all non-exceptional service conditions.
  • Help maintain and improve service operations by following established processes and procedures and periodic update of SOP and documents in Confluence.
  • Create and manage day-to-day processes including Change Management, Incident Management, and Problem Management.
  • Support automation initiatives to enhance MTTR and MTTD.
  • Help track KPIs to support operational performance and service reliability.
  • Participate in incident retrospectives and assist in managing the incident lifecycle.
  • Plan and deploy patches and product enhancements to our environments.
  • Engage in readiness reviews before changes or deployments into production environments.
  • Support product engineering teams on SRE related activities to establish optimal SLAs for all pre-defined activities and provide a high-quality customer experience.
  • Provide detailed summaries of high priority issues to stakeholders ensuring quality in data provided.
  • Participate early in the SDLC to ensure reliability is built in from the beginning and create plans for successful implementations/launches and smooth transition into the SRE team.
  • Create accurate root causes of production issues and help provide long-term solutions to fix them.
  • Continually evaluate and adopt the latest industry technologies to optimize costs and streamline processes.
  • Communicate effectively and present team progress to leadership.
  • Lead by example technically and establish credibility with quality technical execution.
  • Mentor, coach, and develop other SRE team members.
About You
  • 4 to 5+ years of software development and/or technical operations experience, and experience running large-scale applications.
  • Prior experience in SRE / Dev Ops, Infrastructure Engineering, and Systems Engineering required.
  • Experience in defining and monitoring for highly resilient and reliable applications.
  • Experience maintaining and operating production systems (> 99.95% SLA) on Cloud.
  • Able to monitor, debug and RCA for any service failures.
  • Exceptional communication skills that cross both team and geographical boundaries.
  • Advanced knowledge and skills within a specific technical or professional discipline with understanding of the impact of work on…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary