SRE - Sr. Reliability Engineering
Listed on 2026-03-04
-
IT/Tech
Systems Engineer, Cloud Computing
SRE - Sr. Reliability Engineering
Location:
Remote - Irvine CA
Top
Skills:
" Cloud Infrastructure & Automation
o Design and manage scalable systems on platforms GCP.
o Use Infrastructure as Code (IaC) tools such as Terraform.
" Performance & Reliability Engineering
o Experience in capacity planning, performance tuning, and predictive analytics.
o Knowledge of distributed systems and high-availability architectures
" Monitoring & Observability
o Proficiency with APM tools like Dynatrace, New Relic, or App Dynamics.
o Proactive incident detection.
" Programming & Scripting
o Strong coding skills in Python, Go, or Java for automation and reliability improvements.
Experience
Required:
Minimum 4+ years of experience in the specific skill set (SRE)
Overall IT experience of 6 8+ years
Job Description
As we expand our customer deployments to build software that improves our customer s experience, we are seeking an experienced SRE to bring fresh ideas and demonstrate a unique and informed viewpoint to our business. The ideal candidate will be someone who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
As an SRE, you will work with leading edge technologies both on-premise and in the cloud. Automation and superior software quality/performance and resiliency will be your mindset. You will be an expert resource in software and operational high-performance design patterns and support different development, architecture and operational teams from start to finish to create scalable and resilient solutions.
Responsibilities
" Support development, architecture and operational teams for performance/capacity related issues associated with complex multi-tier distributed platforms during the SDLC and postproduction.
" Support/coordinate new Build/Run initiatives prior to production and assure product readiness including infrastructure recommendations, software/script development, load/chaos testing, optimization, SLO definition, capacity planning, and observation/alerting.
" Review services, applications and identify bottlenecks. Identify opportunities to improve performance and scale.
" Perform new POCs for newer technologies and architectural patterns to help teams make informed decisions.
" Define new SLOs for services and applications to meet non-functional SLA requirements defined by the business.
" Work to reduce/minimize ongoing runtime costs through efficient throttling/queuing/pooling/autoscaling across application and infrastructure tiers.
" Proactively identify anomalies and opportunities in platforms in production to achieve greater performance/scale and recommend to impacted teams for future planning.
" Define performance quality gates and support canary development CI/CD scenarios around performance for teams.
Required Skills and Qualifications
" Experience supporting/troubleshooting large scale multi-tier distributed on-premise and cloud applications
" Experience architecting, developing and setting up new infrastructure solutions for GCP cloud leveraging terraform/on-premise applications
" Experience in Capacity Planning or Performance Engineering and leveraging predictive analytics to determine needed scaling patterns for platforms
" Experience programming in languages such as Java, NodeJS, Go, Python and Java Script
" Experience in Web Development and/or Web Service creation
" Demonstrable cross-functional knowledge with systems, storage, networking, security, and databases.
" Experience using APM tools such Dynatrace, New Relic or App Dynamics.
Preferred Qualifications
" Experienced Architect in GCP, Kubernetes, and serverless
" Collaborate with development team to define infrastructure requirements and implement scalable and resilient cloud architecture using terraform.
" Experience in migrating legacy applications to cloud-native architecture
" Strong understanding of Spring Framework
" Experienced in performance tracing/profiling using Google Developer Tools
" Experience with SQL and database scaling/replication schemes
" Familiar with tools used for front end analysis such as Lighthouse, Page Speed Metrics, Webpage Test, GTMetrix and browser developer tools.
" Experience using Mongo
DB/Atlas, Oracle OCI, Postgres, GCP Cloud SQL
"
Experience with Angular
JS, React and Vue
" Experience tuning/optimizing runtime environments for Java (JVMs), Nodejs and Python for the best performance
" Experience with Dev Ops/Quality gating concepts, Canary deployments and automation associated with CI/CD deployments.
" Experience in Enterprise Architecture integration patterns and domain model driven design addressing proper separation of concerns for an application/microservices and core web services.
" Experience using observability tools like Dynatrace or any APM tool is a must.
" Experience using cloud profiling tools and JVM tools like JProfiler/Java Flight Recorder.
" Experience in Testing methodologies and metrics using tools like JMeter, Neo Load, Load Runner or other.
"…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).