Site Reliability Engineer Lead Job Houston area,Texas USA,IT/Tech

Brief Description

We are seeking an Site Reliability Engineer Lead to own and evolve the reliability, scalability, and operational excellence of cloud-native data platforms running primarily on Google Cloud Platform (GCP). This role supports data systems that ingest, process, and serve large volumes of operational data from oilfield and energy environments. The ideal candidate is a cloud-first SRE with deep GCP experience, strong Python engineering skills, and a track record of leading reliability initiatives for data-intensive systems.

Detailed

Description

Lead SRE practices for GCP-based data platforms
Design and own SLIs, SLOs, error budgets, and reliability metrics
Build and maintain cloud-native observability (monitoring, logging, alerting)
Lead incident response for production cloud systems and drive postmortems
Partner with data engineering and platform teams to design reliable architectures
Automate operational workflows using Python
Drive improvements in CI/CD, infrastructure as code, and deployment safety
Mentor engineers and set SRE best practices across the team

Required Knowledge, Skills, and Abilities

7+ years in SRE, Cloud Platform Engineering, or Dev Ops
Strong hands‑on experience with Google Cloud Platform, including GCP: GKE, Compute Engine, Cloud Storage, Pub/Sub (or equivalents)
Cloud Monitoring & Logging
Big Query
Dataflow
Data stream
IAM and networking
Composer/AIrflow
Kubernetes: deployment, scaling, reliability patterns
CI/CD:
Git Hub Actions, Git Lab CI, or similar
Observability: GCP Cloud Monitoring, Logging
Experience supporting cloud‑native data systems (batch and streaming)
Production experience with Python for automation, tooling, or services
Infrastructure as Code experience (Terraform strongly preferred)
Experience operating systems in 24/7 production environments

Minimum Qualifications

Bachelor's degree in Business, Information Technology, Computer Science, or a related field.
5+ years experience in Site Reliability Engineering, Cloud Platform Engineering, or Dev Ops
3+ years operating production workloads on Google Cloud Platform (GCP)
Prior technical leadership experience (lead engineer, tech lead, or ownership of reliability initiatives)
Ability to understand and speak English at a level of proficiency allowing employee to issue, receive and respond to both safety and operations‑related directions in English

Preferred Qualifications

Oil and Gas Industry knowledge
Technology/Digital Industry knowledge

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language