×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer II

Job in Plano, Collin County, Texas, 75086, USA
Listing for: Jcpportraits
Full Time position
Listed on 2025-12-31
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below

At Shutterfly, we make life’s experiences unforgettable. We believe there is extraordinary power in the self-expression. That’s why our family of brands helps customers create products and capture moments that reflect who they uniquely are.

Shutterfly is looking for a Senior Site Reliability Engineer to join our team. Shutterfly is undergoing a comprehensive consumer website re-platforming effort, with the Site Reliability Engineering (SRE) team playing a pivotal role in building shared infrastructure and ensuring future efficiency and supportability. The Senior Site Reliability Engineer II role is responsible for ensuring the reliability, availability, and performance of Shutterfly’s consumer systems.

This position requires deep technical expertise in performance troubleshooting, system optimization, and automation to help maintain resilient, scalable, and cost-efficient platforms. As a senior member of the SRE team, you will collaborate closely with development and operations teams, contribute to automation and observability solutions, and serve as a subject matter expert during incidents.

What You’ll Do Here:
  • Perform advanced performance analysis and troubleshooting across distributed systems to ensure optimal availability, scalability, and cost efficiency.
  • Implement and maintain monitoring, alerting, and observability solutions to provide proactive visibility into application and infrastructure health.
  • Partner with development teams to influence service design and architecture so that new features meet high standards for reliability and scalability.
  • Participate in incident response, including root cause analysis and long-term reliability improvements.
  • Contribute to capacity planning, cost optimization, and performance tuning of large-scale systems.
  • Build and maintain automation and tooling that reduces manual effort, accelerates delivery, and minimizes human error.
  • Explore and apply AI/ML technologies (e.g., anomaly detection, predictive scaling, automated alerting) to enhance SRE practices.
  • Share expertise with peers by documenting best practices, solutions, and troubleshooting methodologies.
  • Collaborate across infrastructure, development, and business teams to align on standards and reliability goals.
  • Provide technical depth and decisive action during critical incidents.
The Skills You’ll Bring:
  • 5–7+ years of experience in software engineering, SRE, or Dev Ops roles supporting large-scale, highly available systems.
  • Strong skills in performance troubleshooting, root cause analysis, and distributed system optimization.
  • Proficiency in at least one programming language (Python, Go, Java, or similar) with ability to write production-quality code.
  • Hands‑on experience with observability platforms (e.g., Splunk, Datadog, Signal Fx, Prometheus, Open Telemetry).
  • Strong knowledge of AWS services, cloud deployment models, and cost optimization strategies.
  • Experience with Infrastructure as Code (Terraform, Cloud Formation) and configuration management (Ansible, Chef, Puppet).
  • Solid understanding of distributed systems concepts (scalability, high availability, fault tolerance).
  • Experience in incident management and driving operational improvements.
  • Exposure to AI/ML or AIOps tools for anomaly detection, predictive analytics, or automated incident response (preferred but not required).
  • Effective communication skills with ability to work across engineering and business teams.
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Supporting a diverse and inclusive workforce is important to Shutterfly not only because it directly reflects our value of Embracing our Differences, but also because it’s the right thing to do for our business and for our people. We welcome all applicants and evaluate them based on their qualifications, without regard to age, race, creed, color, national origin, ancestry, marital status, affectional or sexual orientation, gender identity or expression, disability, nationality, sex, or other characteristic covered by law.

Learn more about our commitment to Diversity, Equity, and Inclusion on our Career Site.

This position will accept applications on an ongoing basis until…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary