Site Leader Job Jamestown area,Town of Poland New York USA,IT/Tech

Location: Town of Poland

This role is for one of the Weekday's clients

Min

Experience:

10 years

Location:

Poland, Remote (poland)

Job Type: full-time

We are seeking a highly experienced and driven Site Leader with a strong background in Site Reliability Engineering (SRE) and Infrastructure to lead and scale our engineering operations. This role is ideal for a seasoned Engineering Manager who thrives at the intersection of leadership, system reliability, and large-scale infrastructure management. As a Site Leader, you will be responsible for building resilient systems, managing high-performing teams, and ensuring the availability, scalability, and performance of mission-critical platforms.

Key Responsibilities

Lead and manage SRE and Infrastructure teams, driving operational excellence and fostering a culture of reliability and accountability.
Define and execute the overall infrastructure and reliability strategy aligned with business goals.
Oversee the design, deployment, and maintenance of scalable, highly available, and secure systems.
Establish and monitor SLAs, SLOs, and SLIs, ensuring consistent service performance and uptime.
Drive incident management processes, including root cause analysis, postmortems, and continuous improvement initiatives.
Collaborate with product and engineering teams to embed reliability and scalability into the development lifecycle.
Champion automation, observability, and proactive monitoring to minimize downtime and improve system health.
Manage infrastructure costs, capacity planning, and resource optimization.
Mentor and develop engineering managers and senior engineers, building a strong leadership pipeline.
Ensure adherence to best practices in cloud infrastructure, Dev Ops, and security compliance.

Required

Skills & Qualifications

10-15 years of experience in software engineering, infrastructure, or SRE, with at least 3-5 years in an Engineering Manager or leadership role.
Proven expertise in Site Reliability Engineering (SRE) principles, including reliability, scalability, and fault tolerance.
Strong experience with cloud platforms (such as AWS, GCP, or Azure) and modern infrastructure architectures.
Deep understanding of infrastructure as code (Terraform, Cloud Formation), CI/CD pipelines, and containerization technologies (Docker, Kubernetes).
Demonstrated ability to lead and scale distributed engineering teams.
Strong problem-solving skills with a focus on system-level thinking and root cause analysis.
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK stack, or similar.
Excellent stakeholder management and communication skills, with the ability to influence cross-functional teams.

Preferred Qualifications

Experience managing large-scale, high-traffic production systems.
Background in Dev Ops transformation and cloud-native architecture.
Familiarity with security best practices and compliance frameworks.

#J-18808-Ljbffr