Senior SRE Engineer
Listed on 2025-12-23
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Recognized as the No. 1 site trusted by real estate professionals, ® has been at the forefront of online real estate for over 25 years, connecting buyers, sellers, and renters with trusted insights and expert guidance to find their perfect home. Through its robust suite of tools, ® not only makes a significant impact on the real estate industry at large, but for consumers, navigating the biggest purchase they will make in their life, by providing a user experience that is easy to use, easy to understand, and most of all, easy to make decisions.
Join us on our mission to empower more people to find their way home by breaking barriers to entry, making the right connections, and building confidence through expert guidance.
About the RoleWe are seeking a Senior Site Reliability Engineer to join our newly formed Operations Excellence organization, reporting to the Director, Operations Excellence. This role will contribute to the reliability, observability, and operational excellence of our platform infrastructure serving millions of users. As a Senior SRE, you will be a strong technical contributor who implements best practices, solves complex problems, and enables our 600+ engineers to deliver exceptional customer experiences.
You will work on critical platform systems including EKS infrastructure, Skyway (CI/CD), Frontdoor (Tyk API Gateway), Pantheon (Apollo Graph
QL Federation), and our observability stack, while contributing to chaos engineering practices and cost optimization initiatives with measurable ROI.
- Implement and maintain highly available AWS infrastructure including EKS clusters, Fargate (ECS), and multi-region architectures
- Support reliability of critical services:
Skyway (CI/CD), Frontdoor (Tyk), Pantheon (Apollo Graph
QL), and supporting infrastructure - Monitor SLIs, SLOs, and error budgets for Tier 1/2/3 systems; participate in architectural reviews for reliability and cost-efficiency
- Implement reliability patterns including circuit breakers, graceful degradation, and automated failover
- Implement observability solutions using New Relic for APM, distributed tracing, metrics, and logging for rapid troubleshooting
- Build dashboards and alerts that reduce MTTD and MTTR; contribute to observability standards across teams
- Identify infrastructure cost optimization opportunities and implement Fin Ops practices including rightsizing and resource lifecycle management
- Support cost-conscious architecture decisions and CI/CD spend optimization (Circle
CI, Argo CD) - Execute chaos engineering experiments to identify system weaknesses; contribute to frameworks for safe production testing
- Participate in game day exercises and disaster recovery simulations; create runbooks and automation for resilience
- Participate in on-call rotation for critical systems; conduct post-incident reviews and implement improvements
- Support incident response processes and contribute to System Health Scorecard
- Contribute as a strong technical individual contributor to the Operations Excellence team
- Collaborate with Platform Engineering, Quality Engineering, and product teams on reliability initiatives
- Support security initiatives including AWS Secrets Manager migration and compliance requirements (SOC 2, PCI, GDPR)
- Contribute to Developer Experience metrics and platform adoption goals
- May provide technical guidance to junior team members
- 5+ years in Site Reliability Engineering, Dev Ops, or Infrastructure Engineering with demonstrated success improving system reliability
- Bachelor’s degree or equivalent experience
- 3+ years hands-on experience with AWS (EKS, EC2, RDS, S3, Cloud Watch, IAM) and Kubernetes including cluster management
- Proficient programming skills (Python, Go, or Java) with infrastructure automation and Infrastructure as Code experience (Terraform, Cloud Formation)
- Production experience with observability tools (New Relic, Datadog, Prometheus, Grafana, Splunk) and distributed systems
- Experience with CI/CD platforms and Git Ops workflows (Circle
CI, Argo CD, Jenkins); on-call rotation and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).