×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; SRE

Job in Salem, Marion County, Oregon, 97308, USA
Listing for: INNOVIT USA INC
Full Time position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Position: Site Reliability Engineer (SRE)

Hiring: W2 Candidates Only

Visa: Open to any visa type with valid work authorization in the USA

Summary

A Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of software systems and infrastructure. This role bridges the gap between development and operations by applying software engineering principles to IT operations, automating processes, and monitoring system health to prevent downtime and improve system efficiency.

Key Responsibilities
  • Design, implement, and maintain reliable, scalable, and highly available infrastructure and services.
  • Monitor system performance, availability, and capacity; respond proactively to incidents and outages.
  • Develop and maintain automation tools for deployment, monitoring, and infrastructure management.
  • Collaborate with software engineers to design systems with reliability and maintainability in mind.
  • Troubleshoot, debug, and resolve complex production issues across multiple systems and services.
  • Implement and maintain CI/CD pipelines, configuration management, and version control best practices.
  • Conduct post-incident reviews, identify root causes, and implement corrective actions to prevent recurrence.
  • Define and enforce service-level objectives (SLOs), service-level indicators (SLIs), and service-level agreements (SLAs).
  • Optimize system performance, cost, and resource utilization through analysis and continuous improvement.
  • Document infrastructure, operational procedures, incident reports, and monitoring configurations.
  • Mentor junior engineers and promote best practices for reliability, automation, and observability.
  • Stay current with emerging technologies and Dev Ops practices to improve operational excellence.
Qualifications
  • Bachelor s degree in Computer Science, Information Technology, or a related field.
  • 3-6 years of experience in site reliability engineering, Dev Ops, or system administration.
  • Strong understanding of Linux/Unix systems, networking, and cloud platforms (AWS, Azure, Google Cloud Platform).
  • Proficiency in scripting and programming languages such as Python, Bash, Go, or Java.
  • Experience with monitoring, logging, and observability tools (Prometheus, Grafana, ELK Stack).
  • Familiarity with containerization and orchestration tools (Docker, Kubernetes).
Preferred Skills / Duties
  • Experience with Infrastructure as Code (Terraform, Ansible, Cloud Formation).
  • Knowledge of CI/CD tools and pipelines (Jenkins, Git Lab, Circle

    CI).
  • Understanding of distributed systems, microservices architecture, and high-availability systems.
  • Strong problem-solving, analytical, and communication skills.
  • Ability to implement security best practices in operational environments.
  • Experience in automating repetitive operational tasks and improving system reliability
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary