×
Register Here to Apply for Jobs or Post Jobs. X

Product Reliability Engineering Lead

Job in Houston, Harris County, Texas, 77019, USA
Listing for: Pyramid Consulting, Inc
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Salary/Wage Range or Industry Benchmark: 85 - 95 USD Hourly USD 85.00 95.00 HOUR
Job Description & How to Apply Below
Immediate need for a talented Product Reliability Engineering Lead
. This is a 12+ Months Contract opportunity with long-term potential and is located in US (Remote-CST). Please review the job description below and contact me ASAP if you are interested.

Job
-15460

Pay Range: $85 - $95/hour. Employee benefits include, but are not limited to, health insurance (medical, dental, vision), 401(k) plan, and paid sick leave (depending on work location).

Key Responsibilities:
  • Define and lead the reliability strategy for the Acquisition Platform, ensuring alignment with product, platform, and enterprise goals.
  • Establish SLOs, SLIs, and error budgets that tie reliability targets to business outcomes and partner expectations.
  • Shift reliability requirements into early design and development phases so resiliency, failover, and graceful degradation are architected in, not bolted on.
  • Design reliability patterns across platform services, APIs, workflows, and dependent systems both internal and external.
  • Architect end to end observability across the platform including metrics, structured logging, distributed tracing, and alerting.
  • Establish monitoring standards and dashboards that provide real time visibility into platform health, partner facing services, and integration dependencies.
  • Embed observability into platform services from design through deployment so teams can detect, diagnose, and resolve issues rapidly.
  • Drive adoption of synthetic monitoring and canary deployments to validate production behavior proactively.
  • Collaborate closely with the Acquisition delivery team and stakeholders to align outcomes with the reliability strategy.
  • Partner with AMS, infrastructure, and other tech teams to ensure clear ownership boundaries and smooth operational handoffs.
  • SRE principles – SLOs, SLIs, error budgets, toil reduction, blameless postmortems
  • Observability design – distributed tracing, APM telemetry, structured logging, real time alerting, synthetic monitoring
  • Resilience and fault tolerance – circuit breakers, bulkheads, retry/backoff, graceful degradation, failover validation
  • Chaos engineering and reliability testing – fault injection, load/stress testing, failure mode analysis
  • CI/CD reliability integration – automated reliability gates, canary deployments, feature flags, progressive rollouts
  • AI assisted reliability techniques – anomaly detection, predictive alerting, prompt driven runbook automation, agent based remediation
  • Responsible AI use – including consideration of security, data exposure, and operational risk
  • Cloud native operations – containerized platforms, event driven architectures, infrastructure as code
  • Growth oriented mindset – ability to think beyond constraints of today and identify what is required to build the future
  • Excellent communication skills – ability to translate reliability concerns between engineering, product, and business teams
Key Requirements and Technology Experience:
  • Must have skills: - Site Reliability Engineering (SRE), AWS Cloud (EKS/ECS/Lambda), Observability & Monitoring (Prometheus/Grafana/Datadog/Splunk), Kubernetes & CI/CD Automation, Chaos Engineering & Reliability Testing, SLO/SLI/Error Budget Management
  • 5+ years of experience in site reliability engineering, platform engineering, or production operations roles
  • Experience defining and operating SLO/SLI frameworks tied to business outcomes
  • Hands on experience designing observability for distributed, API driven platforms
  • Experience with reliability and resiliency testing including chaos engineering and fault injection
  • Experience guiding and mentoring engineers on reliability practices
  • Enterprise scale delivery experience with both onshore and offshore cross functional teams
  • Direct experience applying Agile methodologies in product centric delivery models
  • AWS operational experience – Cloud Watch, X Ray, Fault Injection Simulator, ECS/EKS, Lambda, Event Bridge
  • Experience integrating reliability practices with Dev Sec Ops  and CI/CD pipelines
  • Familiarity with AI/ML driven operations tools and incident management platforms
Our client is a leading Insurance Industry and we are currently interviewing to fill this and other similar contract positions. If you…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary