×
Register Here to Apply for Jobs or Post Jobs. X

Service Reliability Engineer

Job in Plano, Collin County, Texas, 75086, USA
Listing for: Berkshire Hathaway Homestate Company group
Full Time position
Listed on 2026-02-20
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, IT Support, Cybersecurity
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

The company's IT/Business Applications team has an immediate opening for a Service Reliability Engineer to ensure the reliability, availability, and performance of business applications, IT services and infrastructure. This individual will focus on optimizing software applications, infrastructure, automating manual tasks, monitoring system performance, and resolving incidents efficiently, as well as contribute to the design and development of software and tools that enhance the reliability and stability of production environments.

ESSENTIAL

RESPONSIBILITIES SERVICE RELIABILITY & UPTIME
  • Ensure the reliability, availability, and performance of key Business Applications, IT services and infrastructure by monitoring system health and identifying potential risks.
  • Implement proactive measures, such as performance tuning and capacity planning, to avoid service disruptions.
  • Maintain and improve service-level objectives (SLOs) and service-level agreements (SLAs) across systems and services.
INCIDENT MANAGEMENT & TROUBLESHOOTING
  • Monitor and respond to incidents, troubleshooting issues across the entire stack (network, systems, software, applications).
  • Conduct root cause analysis (RCA) of system failures and recommend or implement long-term solutions to prevent recurrence.
  • Participate in on-call rotations, ensuring timely resolution of incidents and minimizing downtime.
  • Collaborate with development and operations teams to improve system observability and alerting through monitoring tools.
SYSTEM ARCHITECTURE & SCALABILITY
  • Contribute to the design and implementation of scalable, highly available, and fault-tolerant architectures for distributed systems.
  • Collaborate with software engineers and architects to optimize system architecture for high reliability and performance.
  • Manage cloud infrastructure and services (Azure, AWS, or Google Cloud) to ensure efficient resource utilization and scalability.
NETWORKING & SECURITY
  • Design and manage networking components, including load balancers, firewalls, VPNs, and DNS, ensuring secure, scalable, and resilient network infrastructure.
  • Troubleshoot network performance issues, including latency, packet loss, and bandwidth bottlenecks.
  • Work with security teams to implement best practices for security hardening, encryption, and compliance in production environments.
  • Ensure secure communication between services, utilizing technologies such as TLS, SSH, VPN, and firewall configurations.
MONITORING & OBSERVABILITY
  • Implement comprehensive monitoring, logging, and alerting systems using tools such as Dynatrace, Prometheus, Grafana, Datadog, or Splunk.
  • Create and maintain dashboards that provide real-time insights into system performance, availability, and key reliability metrics.
  • Set up monitoring for key infrastructure components (e.g., servers, databases, microservices) and define actionable alerts.
  • Conduct capacity and performance testing to ensure the systems can handle increasing traffic and workloads.
  • Perform periodic (e.g. daily, monthly, annually) SRE manual and automated operations to ensure proper performance of corporate enterprise applications and systems.
ON-PREMISES, HYBRID, AND CLOUD ENVIRONMENTS
  • Work with business applications across various environments, including on-premises, hybrid, and cloud systems.
  • Work with the infrastructure and cloud teams to ensure that application environments are stable, secure, and meet business performance expectations.
  • Support the transition of applications from on-premises environments to cloud or hybrid architectures, working closely with senior IT leadership on cloud migration strategies.
  • Ensure proper governance and performance monitoring for applications in all environments, proactively identifying areas for optimization.
RISK & COMPLIANCE
  • Develop and implement procedures for regular audits, risk assessments, and disaster recovery plans for critical applications.
  • Ensure that QA processes adhere to relevant industry standards and regulatory requirements (e.g., ISO, GDPR, HIPAA).
  • Develop and maintain test documentation, including test plans, test cases, test scripts, and test data management.
REQUIRED QUALIFICATIONS
  • EDUCATION:

    Bachelor's degree in Computer Science, Information Technology, or related field, required.
  • CERTIFICATIONS:

    Certification in cloud platforms (e.g., Microsoft Azure Administrator), preferred.
EXPERIENCE
  • A minimum of 7 years of experience as a Service Reliability Engineer, Dev Ops Engineer, or Systems Engineer, with hands-on exposure to networking, systems, and software development, required.
  • Strong experience with cloud platforms (e.g., Azure, AWS, or Google Cloud), including experience managing cloud-based services and infrastructure, required.
  • Experience with scripting or programming languages (e.g., Powershell, Python, Go, Bash, Java) for automation and tooling required.
  • Experience with monitoring and observability tools (e.g., Dynatrace, Prometheus, Grafana, Datadog, Splunk) required.
  • Hands‑on experience with CI/CD pipelines and version control (e.g.,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary