Site Reliability Engineering Lead
Are you passionate about building resilient systems and empowering teams to deliver reliable cloud solutions?
Do you thrive in designing and managing scalable platforms that keep services running smoothly?
About our teamThe Lexis Nexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deli ver data to support Lexis Nexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively.
About the roleWe are seeking a highly skilled and motivated SRE and Platform/Cloud Engineering Lead to lead a team responsible for ensuring the reliability, scalability, and resilience of mission‑critical systems for our IP business. This role is pivotal in managing a small team of senior engineers, driving operational excellence, and fostering a culture of continuous improvement.
You will collaborate closely with the central SRE organisation as well as work closely with IP product, development, architecture, and security teams to implement best practices in site reliability engineering, cloud platform management, and environment support for internal development and customer systems. The Lead will lead initiatives around incident response, disaster recovery, automation, monitoring, Fin Ops cost optimisation, and customer support escalations.
This is a junior management‑level position requiring strong leadership, technical depth across cloud and infrastructure technologies, and the ability to influence both technical direction and business outcomes.
Skills & Experience- Cloud Platforms & Services: Azure and AWS (EKS, EC2, S3, RDS, Lambda, Azure VMs, Functions).
- Infrastructure as Code: Terraform, ARM/BICEP.
- Containerization & Orchestration: Docker, Kubernetes (EKS/AKS), Helm, ArgoCD.
- Monitoring & Observability: Datadog, Splunk, Coralogix, Cloud Watch, Azure Monitor, along with an understanding of baseline metrics.
- Scripting & Automation: Python, Bash, Power Shell, Type Script, JavaScript.
- Programming Knowledge: Java, .NET/C#, SQL, React (for integration with supported products).
- Systems & Networking: Linux/UNIX/Windows administration, networking, and security best practices.
- Specialized Knowledge: Databricks, Fin Ops cost management, disaster recovery planning.
- Core Competencies: Incident management, troubleshooting, IT service management frameworks, and Git Ops/Dev Ops practices.
- Solid understanding of Site Reliability Engineering (SRE) principles and practices.
- Strong understanding of incident management, monitoring tools, IT service management frameworks and automation processes.
- Previous experience in customer-facing roles or managing customer support escalations
- Excellent technical problem-solving and troubleshooting abilities.
- Strong communication and interpersonal skills, with the ability to collaborate across teams.
- Leadership skills with a track record of mentoring and guiding technical teams
- Strong collaboration and advanced communication skills at peer and senior management level.
- Strong skills in setting, communicating, implementing, and achieving business objectives and goals through indirect leadership of and collaboration with others.
- Strong organization/project planning, time management, and change management skills across multiple functional groups and departments, and strong delegation skills involving prioritizing and reprioritizing projects and managing projects of various size
and complexity. - Advanced problem-solving experience involving leading teams in identifying, researching, and coordinating the resources necessary to effectively troubleshoot/diagnose complex project issues; prior success extracting/translating findings into alternatives/solutions; and identifying risks/impacts and schedule adjustments to facilitate…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: