Senior Engineer, SRE
Listed on 2026-02-15
-
IT/Tech
SRE/Site Reliability, Cloud Computing, IT Support
Job : 282458
Location Name: FSC REMOTE SF/NY/DC
-173()
Address: FSC, Remote, CA 94105, United States (US)
Job Type: Full Time
Position Type: Regular
Job Function: Information Technology
Work Location: Remote
Belong to Something Beautiful
At Sephora, beauty is about feeling seen, valued, and empowered, individually and collectively. It is connecting deeply with others, celebrating diversity and inclusivity, unlocking your potential, and making a difference every day. Together, we belong to something beautiful.
Your Role at SephoraReady for a career glow up? As Senior Engineer, Site Reliability Engineering - Digital
, you'll be ensuring hyper-stable online experiences for millions of Sephora customers. The work you do will impact beauty, as you monitor, optimize, and safeguard the reliability of Sephora's Dotcom platform and OMNI services. You'll be part of a team that's united in beauty, supported by those who are equally passionate about delivering resilient, high-performance digital experiences that connect customers to the products they love.
You’ll Do
- Ensure Platform Stability. Operate and support the Dotcom and OMNI platform (including BOPIS and Same-Day Delivery), ensuring high availability, resilience, and hyper-stable customer experiences during normal operations and peak traffic events.
- Lead Incident Response. Triage, diagnose, and resolve L2/L3 production incidents; lead post-incident reviews and partner with engineering teams on permanent corrective actions to eliminate root causes.
- Drive Intelligent Automation. Build automation solutions, reduce operational toil, and create AI-driven reliability tools and agentic workflows to improve mean time to resolution, productivity, and overall stability.
- Enhance Observability. Develop and optimize observability through logs, metrics, traces, dashboards, and anomaly detection; refine alerting and telemetry pipelines to proactively identify and resolve issues.
- Validate Release Readiness. Ensure world-class readiness for releases, seasonal events, feature launches, and traffic spikes through resiliency checks, performance validation, and comprehensive change reviews.
- Maintain Reliability Standards. Maintain and optimize SLO/SLI frameworks; monitor error budgets and partner with application teams on continuous reliability improvements.
- Deep SRE Expertise. 6+ years of hands‑on SRE, Dev Ops, or Production Engineering experience in high‑scale digital applications, with a strong understanding of reliability principles and operational excellence.
- Cloud‑Native Technical Skills. Strong exposure to Azure AKS, Kubernetes, Docker, Service Mesh, and API‑driven architectures, with operational support experience for React front‑end and Spring Boot microservices in production environments.
- Observability and Automation Mastery. Hands‑on experience with observability tools (Dynatrace, Splunk, Grafana, Prometheus) and strong scripting abilities (Python, Bash, Power Shell, YAML) to build automation that reduces toil and improves incident response.
- Incident Management Excellence. Proven experience in incident management, root cause analysis, and implementing permanent corrective actions that drive long‑term reliability improvements.
- CI/CD and Platform Knowledge. Experience with SRE principles, CI/CD pipelines (Jenkins, Git Hub Actions), and cloud platforms (Azure required; AWS/GCP/OCI a plus).
- Analytical Problem‑Solver. Strong analytical and problem‑solving abilities with clear communication skills under pressure, a collaborative mindset, and passion for reducing toil while improving developer and operator experiences.
The annual base salary range for this position is $ - $. The actual base salary offered depends on a variety of factors, which may include, as applicable, the applicant’s qualifications for the position; years of relevant experience; specific and unique skills; level of education attained; certifications or other professional licenses held; other legitimate, non‑discriminatory business factors specific to the position;
and the geographic location in which the applicant lives and/or from which they will perform the job. Individuals…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).