×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Cedar Rapids, Linn County, Iowa, 52404, USA
Listing for: UFG Insurance
Full Time position
Listed on 2025-12-27
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support, Cloud Computing, Cybersecurity
Job Description & How to Apply Below

Location:

Cedar Rapids, Hybrid, Cedar Rapids, IA, USA

Site Reliability Engineer – The senior‑most engineer on the Production Management team responsible for ensuring the reliability, performance, scalability, and efficiency of critical production systems and services. This role combines software engineering, systems engineering, solutions architecture, and deep knowledge of how technology functions in order to troubleshoot, operate, and enhance highly reliable distributed systems. The ideal candidate is proactive, automation‑driven, and passionate about implementing solutions that enhance uptime, service quality, and developer productivity.

Responsibilities
  • Implement tooling to monitor system health, capacity, and performance at all levels, from hardware through the VMs and all the way to the end‑user interface.
  • Work with the production management team to troubleshoot incidents, restore service, and identify root causes.
  • Recommend architectural and implementation changes to products delivered by development teams based on their performance in test, performance, and production environments.
  • Support continuous improvement of ITIL processes through automation, data‑driven insights, and proactive problem identification.
  • Document and integrate SRE practices into the ITIL framework, including incident, change, and problem management workflows.
  • Develop automation for system provisioning, monitoring, deployment, and recovery to reduce manual effort and human error.
  • Develop and maintain comprehensive runbooks, standard operating procedures (SOPs), and knowledge base articles for recurring operational tasks and incident response actions.
  • Collaborate with development teams to design resilient architecture and implement best practices for reliability and observability.
  • Enhance observability by developing and maintaining dashboards, alerts, and performance analytics.
  • Contribute to capacity planning, performance tuning, and resilience testing to ensure system health.
  • Develop and update problem management documentation, ensuring known errors and workarounds are captured within the ITSM system.
  • Manage incident response and participate in on‑call rotations to ensure service reliability.
  • Define, document and track key reliability metrics (SLIs, SLOs, SLAs) and implement continuous improvement initiatives.
  • Drive post‑incident reviews (PIRs) and develop actionable insights to prevent future occurrences.
  • Partner with security teams to ensure systems meet compliance, security, and governance standards.
  • Evaluate and recommend new tools, technologies, and frameworks to improve operational efficiency.
  • Monitor network systems, servers, and applications.
  • Use all necessary tools to investigate performance and reliability of systems in testing environments. Provide detailed and specific guidance on ways to eliminate bottlenecks, improve resilience, and optimize speed and reliability.
  • Provide mentorship and technical support to other members of Production Management.
Job Specifications Education
  • Bachelor’s degree in information technology, Computer Science, or a related field, or equivalent experience
  • Master’s or other advanced degree preferred.
Experience
  • 10+ years of experience in progressively more demanding enterprise‑scale technology roles
  • 3+ years of experience as a Site Reliability Engineer or Senior Dev Ops Engineer
  • 3+ years in software development, architecture, or related engineering discipline
Knowledge, skills & abilities
  • Advanced experience with multiple enterprise monitoring and observability tools, including Dynatrace, PRTG, DTrace, Solar Winds, and similar.
  • Complete Windows fluency mandatory; similar strengths in LINUX and Unisys Mainframe environments helpful
  • Excellent problem‑solving and communication skills, with the ability to collaborate across cross‑functional teams.
  • Unparalleled understanding of:
  • advanced networking concepts and complete expertise in the entire TCP/IP stack
  • VM (VMware and Hyper

    V) and physical compute performance and tuning, including networking and storage performance
  • SQL Server expertise, including troubleshooting queries, indexes, and general performance
  • Experience with unstructured database performance
  • General…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary