Site Reliability Engineering
Job in
Los Angeles, Los Angeles County, California, 90079, USA
Listed on 2026-01-13
Listing for:
SARIAN Co
Full Time
position Listed on 2026-01-13
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, IT Project Manager
Job Description & How to Apply Below
Role:
Site Reliability Engineering (SRE)
Location: Los Angeles, CA
Remote position
Fulltime position
Responsibilities & Qualifications- Site Reliability Engineer
- Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments.
- Proficiency in container technologies (Docker, Container, Podman).
- Strong knowledge of Linux administration and networking concepts.
- Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, Helm, or Pulumi.
- Monitoring and logging expertise using Prometheus, Grafana, ELK, Datadog, or Splunk.
- Hands‑on experience with CI/CD pipelines and Dev Ops tools (Jenkins, Git Hub Actions, Git Lab CI, ArgoCD).
- Proficiency in scripting/programming (Python, Bash, Go) for automation.
- Strong troubleshooting and incident management skills.
- Seeking a highly skilled Site Reliability Engineer (SRE) to manage, optimize, and ensure the reliability of infrastructure.
- Ideal candidate will have deep expertise in ELK, Dynatrace, Pagerduty.
- Powershell, container orchestration, cloud infrastructure, and automation, along with a strong focus on reliability, scalability, and performance. Good to have Logic Monitor and Python knowledge.
- Reliability & Performance:
Implement best practices to ensure high availability, scalability, and performance of containerized applications. - Monitoring & Incident Response:
Set up monitoring (Prometheus, Grafana, ELK, Dynatrace, Pagerduty, Powershell etc.), troubleshoot issues, and lead incident resolution. - Automation & Infrastructure as Code (IaC):
Develop and maintain Terraform, Helm charts, and Kubernetes manifests for automation. - CI/CD & Dev Ops Integration:
Work with Dev Ops teams to optimize CI/CD pipelines for Kubernetes deployments (Jenkins, ArgoCD, FluxCD, etc.). - Security & Compliance:
Implement security best practices for containerized workloads, RBAC, network policies, and vulnerability scanning. - Capacity Planning & Optimization:
Analyze resource usage and optimize infrastructure costs and performance. - Disaster Recovery & Backup:
Implement backup and disaster recovery strategies for Kubernetes workloads.
Certified Minority Business Enterprise (WMBE)
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×