Head of Platform & Reliability Engineering Job Charlotte area,North Carolina USA,IT/Tech

About the Opportunity

We operate a large-scale, real-time fleet technology platform supporting enterprise clients across North America. The company manages hundreds of thousands of connected mobile assets and delivers mission-critical telemetry, safety intelligence, and operational data 24x7.

As the organization continues to scale, we are seeking a senior technical leader to take full ownership of the infrastructure, cloud architecture, reliability engineering, and internal technology operations that power the platform.

This role is not purely managerial. It requires a hands-on leader who can architect systems, improve operational maturity, and build a high-performing engineering organization capable of supporting sustained growth.

Role Overview

The Head of Platform & Reliability Engineering is accountable for the performance, availability, security, and scalability of the company’s hybrid cloud and on-premise technology stack.

This individual will lead infrastructure engineering, Dev Ops, SRE, and corporate IT functions while establishing modern platform standards, strengthening operational discipline, and ensuring 24x7 service continuity.

You will serve as the executive owner of uptime, resilience, and infrastructure strategy.

What You Will Own

Platform Architecture & Operations

Lead design and operation of hybrid cloud environments (Azure + data center)
Ensure high availability, redundancy, and performance across production systems
Architect secure networking, identity management, storage, backup, and monitoring solutions
Drive cloud cost governance and resource optimization initiatives
Establish standards for logging, alerting, access control, and infrastructure consistency

Reliability & Dev Ops Modernization

Implement and scale CI/CD pipelines and Infrastructure as Code practices
Lead Kubernetes architecture and container orchestration initiatives
Introduce SRE principles including SLOs, SLIs, error budgets, and blameless postmortems
Improve deployment velocity while reducing operational risk
Strengthen change management discipline without slowing innovation

Security, Risk & Disaster Recovery

Implement access governance, vulnerability management, and monitoring controls
Establish incident response procedures and root cause analysis standards
Define RPO/RTO targets and execute disaster recovery strategy
Conduct regular recovery testing and maintain documented runbooks
Ensure encryption, secrets management, and privileged access controls meet enterprise standards

Internal Technology & IT Operations

Oversee corporate systems including endpoints, SaaS platforms, telecom, and collaboration tools
Establish IT service management practices (incident, problem, asset, request workflows)
Manage vendor relationships and licensing strategy
Improve employee experience through reliable and secure internal systems

Leadership & Organizational Development

Build and mentor a multidisciplinary team across infrastructure, Dev Ops/SRE, and IT support
Establish operational metrics and executive reporting cadence
Lead capacity planning to support company growth
Foster a culture of accountability, ownership, and continuous improvement

Required Background

10+ years in infrastructure, cloud engineering, or platform operations
3–5+ years leading technical teams in production environments
Deep experience with Microsoft Azure (compute, networking, identity, monitoring, cost management)
Proven experience operating high-availability, customer-facing platforms
Hands-on Kubernetes and container orchestration expertise
Strong understanding of Infrastructure as Code (Terraform, ARM, Bicep, etc.)
Experience implementing structured change management processes
Direct ownership of disaster recovery planning and testing
Security controls implementation experience in enterprise environments
Ability to communicate technical risk and tradeoffs to executive leadership

Preferred Experience

B2B SaaS or real-time data platforms
Telematics, IoT, fleet technology, or distributed systems
Observability tooling (Datadog, Prometheus/Grafana, Azure Monitor, etc.)
ITSM platforms such as Jira Service Management or Service Now
Experience scaling infrastructure to support rapid growth
Exposure to compliance frameworks (SOC 2, ISO 27001, HIPAA, etc.)
Relevant Azure or security certifications

What Success Looks Like

Measurable improvement in uptime and MTTR
Increased release velocity with lower deployment risk
Predictable cloud cost management
Tested and validated disaster recovery posture
A cohesive, high-performing platform engineering organization


Increase/decrease your Search Radius (miles)



Job Posting Language