Site Reliability Engineer
New York, New York County, New York, 10261, USA
Listed on 2026-06-02
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Join Mizuho as a Site Reliability Engineer! In this role you will play a crucial role in maintaining the reliability, scalability, and overall performance of our production systems. This position collaborates closely with development, operations, and product teams to automate workflows, monitor system health, and maintain robust services. Expertise in Grafana is vital for creating insightful visualizations and analyzing performance metrics.
Key Responsibilities- Design, implement, and manage automated deployment, monitoring, and alerting solutions.
- Build and support scalable infrastructure through Infrastructure as Code (IaC) tools.
- Use Grafana and other monitoring platforms to track system reliability and performance.
- Partner with development and operations for ongoing improvements to system reliability and efficiency.
- Diagnose and resolve production issues quickly to minimize downtime.
- Create and maintain best practices and guidelines for SRE processes.
- Enhance observability by improving logging, monitoring, and alert systems.
- Participate in on‑call rotations to ensure round‑the‑clock support for critical systems.
- Lead post‑incident reviews and put preventative measures in place.
- Mentor and educate team members on SRE methodologies and technologies.
- Bachelor’s degree (or equivalent experience) in Computer Science, Engineering, or a related area.
- Demonstrated experience as a Site Reliability Engineer (SRE) or in a similar capacity.
- Strong background in automation tools and methodologies such as Ansible, Terraform, or Jenkins.
- Advanced skills in monitoring and visualization with Grafana.
- Experience working with cloud providers like AWS, Azure, or Google Cloud.
- In-depth knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with CI/CD pipelines and associated tools.
- Proficient scripting or programming abilities in Python, Bash, or Go.
- Exceptional problem‑solving and troubleshooting capabilities.
- Excellent communication and teamwork skills.
- Comfortable working in a fast‑paced, ever‑changing environment.
- Hands‑on experience with Prometheus or comparable time‑series databases.
- Solid understanding of networking and security best practices.
- Knowledgeable in database administration and optimization strategies.
The expected base salary ranges from $111,000 to $160,000 per year. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, certifications, and licenses, as well as market and organizational considerations.
Additional benefits include a generous employee benefits package and eligibility for a discretionary bonus.
Mizuho has in place a hybrid working program, with varying opportunities for remote work depending on the nature of the role, departmental needs, and local laws and regulatory obligations. Roles in some departments may have greater in‑office requirements, and this will be communicated during the recruitment process.
We are an EEO/AA Employer – M/F/Disability/Veteran. We participate in the E-Verify program. We maintain a drug‑free workplace and reserve the right to require pre‑ and post‑hire drug testing as permitted by applicable law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).