More jobs:
Site Reliability Engineer Lead
Job in
Houston, Harris County, Texas, 77246, USA
Listed on 2026-02-16
Listing for:
Patterson-UTI
Full Time
position Listed on 2026-02-16
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below
Brief Description
We are seeking an Site Reliability Engineer Lead to own and evolve the reliability, scalability, and operational excellence of cloud-native data platforms running primarily on Google Cloud Platform (GCP). This role supports data systems that ingest, process, and serve large volumes of operational data from oilfield and energy environments. The ideal candidate is a cloud-first SRE with deep GCP experience, strong Python engineering skills, and a track record of leading reliability initiatives for data-intensive systems.
DetailedDescription
- Lead SRE practices for GCP-based data platforms
- Design and own SLIs, SLOs, error budgets, and reliability metrics
- Build and maintain cloud-native observability (monitoring, logging, alerting)
- Lead incident response for production cloud systems and drive postmortems
- Partner with data engineering and platform teams to design reliable architectures
- Automate operational workflows using Python
- Drive improvements in CI/CD, infrastructure as code, and deployment safety
- Mentor engineers and set SRE best practices across the team
- 7+ years in SRE, Cloud Platform Engineering, or Dev Ops
- Strong hands‑on experience with Google Cloud Platform, including GCP: GKE, Compute Engine, Cloud Storage, Pub/Sub (or equivalents)
- Cloud Monitoring & Logging
- Big Query
- Dataflow
- Data stream
- IAM and networking
- Composer/AIrflow
- Kubernetes: deployment, scaling, reliability patterns
- CI/CD:
Git Hub Actions, Git Lab CI, or similar - Observability: GCP Cloud Monitoring, Logging
- Experience supporting cloud‑native data systems (batch and streaming)
- Production experience with Python for automation, tooling, or services
- Infrastructure as Code experience (Terraform strongly preferred)
- Experience operating systems in 24/7 production environments
- Bachelor's degree in Business, Information Technology, Computer Science, or a related field.
- 5+ years experience in Site Reliability Engineering, Cloud Platform Engineering, or Dev Ops
- 3+ years operating production workloads on Google Cloud Platform (GCP)
- Prior technical leadership experience (lead engineer, tech lead, or ownership of reliability initiatives)
- Ability to understand and speak English at a level of proficiency allowing employee to issue, receive and respond to both safety and operations‑related directions in English
- Oil and Gas Industry knowledge
- Technology/Digital Industry knowledge
Position Requirements
5+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×