Sr. SRE/DevOps Engineer - Sunnyvale, CA; Local
Job in
Sunnyvale, Santa Clara County, California, 94087, USA
Listed on 2026-06-24
Listing for:
donato technologies
Full Time
position Listed on 2026-06-24
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, Systems Engineer
Job Description & How to Apply Below
Title:
Sr. SRE / Dev Ops Engineer
Location:
Sunnyvale, CA
Job Summary –
For this role, we are looking for a Sr. SRE / Dev Ops Engineer at Sunnyvale, California location. As Site Reliability Engineer, the individual will work closely with multi‑functional teams, automate operations, optimize infrastructure, implement security and solve issues in an exciting, fast‑paced environment. The individual will play a vital role in ensuring that the systems are reliable, scalable, and high performing.
Responsibilities- Ensure system reliability and availability – Monitor system issues, create strategies to detect issues, address those issues, design automated systems to troubleshoot, write and review post‑mortems.
- Mitigate operational risks - Collaborate with development teams and other stakeholders to identify potential risks, perform risk assessments, implement risk mitigation strategies, continuously monitor and review the effectiveness of risk strategies.
- Monitor system health.
- Minimize emergency response (MTTR).
- Maintain CI/CD pipelines, etc.
- Continuous improvement by collaborating with various teams.
- Automation of processes.
- 8+ years of experience on Dev Ops and Site Reliability Engineering.
- Hands‑on with containerization and orchestration:
Docker, Kubernetes/EKS. - Proficiency in infrastructure as code tools:
Terraform, Ansible, or Cloud Formation. - Experience setting up and managing services running on Kubernetes.
- In‑depth understanding of SRE principals including monitoring, alerting, error budgets, fault analysis, and automation.
- In‑depth knowledge of monitoring and observability tools:
Apache Splunk. - Knowledge of Linux operating system principles, networking fundamentals, and systems management.
- Demonstrable fluency in at least one of the following languages:
Java or Python. - Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.
- Building and managing CI/CD pipeline – gatekeeping production deployments, develop and implement GIT branching strategies, branch protection rules, network policies, scale up/ scale down the load on AWS.
- Strong problem‑solving and analytical skills.
- Solve performance issues and scalability issues in the system.
- Dev Ops and SRE
- AWS, Kubernetes/EKS, Docker
- Terraform, Ansible, or Cloud Formation
- Apache Splunk, Apache Flink
- Programming/Scripting using Java or Python
- CI/CD
- Database – Vertica, Snowflake.
- Excellent communication skills and collaboration skills
- Ability to propose and implement improvements in the system
- Ability to work with cross‑functional stakeholders
- Adaptability and a willingness to learn new technologies and techniques.
- Proactive approach to issues, ability to provide prompt resolution/work
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×