Director, Platform Engineering
Listed on 2025-12-31
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, IT Project Manager
Kaseya® is the leading provider of complete IT infrastructure and security management solutions for Managed Service Providers (MSPs) and internal IT organizations worldwide powered by AI. Kaseya’s best-in-breed technologies allow organizations to efficiently manage and secure IT to drive sustained business success. Kaseya has achieved sustained, strong double‑digit growth over the past several years and is backed by Insight Venture Partners , a leading global private equity firm investing in high‑growth technology and software companies that drive transformative change in the industries they serve.
Founded in 2000, Kaseya currently serves customers in over 20 countries across a wide variety of industries and manages over 15 million endpoints worldwide. To learn more about our company and our award‑winning solutions, go to and for more information on Kaseya’s culture.
Kaseya is not your typical company. We are not afraid to tell you exactly who we are and our expectations. The thousands of people that succeed at Kaseya are prepared to go above and beyond for the betterment of our customers.
We are seeking a strategic and technically accomplished Director of Site Reliability Engineering (SRE) to lead our global infrastructure, network, and public cloud engineer and operations teams. The ideal candidate will have a strong background in site reliability engineering, network management, infrastructure services, and cloud technologies. This role requires a strategic thinker with excellent leadership skills to ensure the reliability, scalability, and performance of our systems.
Responsibilities- Architect and manage resilient infrastructure across all global office locations
- Develop and implement strategies to ensure the reliability, availability, and performance of our systems
- Oversee the design, deployment, and maintenance of network infrastructure, ensuring optimal performance and security
- Lead public cloud deployments (AWS, Azure, OCI) with a focus on scalability, cost‑efficiency, and compliance
- Collaborate with cross‑functional teams to define and implement infrastructure and network standards
- Establish observability and monitoring systems to proactively manage performance and availability
- Develop and maintain disaster recovery and business continuity plans
- Ensure compliance with industry standards and regulations
- Mentor and develop team members, fostering a culture of continuous improvement and innovation
- Maintain comprehensive infrastructure diagrams, and create processes, SOPs, and other technical documentation
- Provide technical leadership and training to engineers on the team
- Establish best practices throughout the entire technology lifecycle management framework
- Build and mature relations with business partners to identify areas of improvement to support business growth and agility
- 12+ years of experience in site reliability engineering, network management, and infrastructure services, with 5+ years in the leadership role
- Extensive experience with network technologies such as Palo Alto and Meraki firewalls, Cisco and Meraki switch devices
- Excellent understanding of networking technologies such as BGP, OSPF, STP (RSTP/MSTP), AAA, and layer 2 switching
- Proven experience with global hybrid‑cloud interconnectivity network architecture
- Expertise in solutions architecture principles working with public cloud service platforms including Azure, AWS and OCI
- Familiar with network access control principles and enterprise‑scale solutions using tools such as CISO ISE and PRISMA Access
- Proven working experience with cloud service platforms such as Azure, AWS and OCI and knowledge of best practices and methods for resolving issues in those settings
- Working knowledge of Infrastructure and Network monitoring systems such as Logicmonitor, Solarwinds, and Thousandeyes
- Good knowledge and experience in managing Azure landing zone architectures, Server and Storage workloads, Entra , Active Directory, DNS, and DHCP services
- Knowledge of business continuity and disaster recovery continuity of operations plans
- Experience with automation and orchestration tools such as Ansible, Terraform, or Kubernetes
- Skill in assessing security…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).