Principal Platform Reliability Engineer
Listed on 2026-06-02
-
IT/Tech
Systems Engineer, Cloud Computing
Eli Lilly and Company seeks a Platform Site Reliability Engineer to join the Software Product Engineering (SPE) Customer Operations team. You will design, operate, and continuously improve highly available, scalable, and fault‑tolerant systems across cloud environments. You will play a critical role in establishing reliability standards, driving operational excellence, and enabling engineering teams to build and deploy with confidence.
What You’ll Do:- Define and implement SLOs, SLIs, and reliability standards that establish a consistent foundation for platform health, driving resilience through capacity planning, failover design, and disaster recovery strategies
- Lead response for P1/P2 incidents, owning rapid mitigation and recovery while conducting thorough root cause analysis and implementing corrective actions that prevent recurrence
- Develop and maintain runbooks, playbooks, and operational standards that enable the broader engineering organization to respond effectively and consistently
- Implement and optimize observability frameworks spanning monitoring, logging, tracing, and alerting — improving system visibility and reducing alert noise through actionable, signal‑driven insights
- Leverage platforms such as Splunk, Prometheus, Cloud Watch, or equivalent tooling to ensure teams have the telemetry they need to detect, diagnose, and resolve issues proactively
- Build and maintain CI/CD pipelines and deployment automation; drive adoption of Infrastructure as Code and Git Ops practices across engineering teams
- Support engineering teams in integrating SRE principles throughout the software lifecycle
- Implement secure‑by‑design practices across infrastructure and platforms, support vulnerability remediation and secure configurations, and ensure alignment with enterprise security and compliance standards
- Partner with engineering teams to improve reliability, performance, and deployment practices
- Provide technical guidance and mentorship to engineers, and communicate system health and incident impact clearly to stakeholders at all levels
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field
- 7+ years of hands‑on experience with AWS
- Extensive experience with Kubernetes and containerization technologies (Docker, EKS, etc.)
- Experience operating production‑grade distributed systems
- Experience in incident management and on‑call support models
- Experience defining and managing SLOs, SLIs, and error budgets
- Hands‑on experience with observability tools such as Splunk and the LGTM stack
- Experience building and maintaining CI/CD pipelines
- Proficient Experience in Infrastructure as Code tools (Terraform, Cloud Formation, etc.)
- Experience with scripting in Python, Bash, or Power Shell
- Experience with networking and cloud architecture fundamentals
- Experience implementing security best practices in cloud environments
- Experience troubleshooting complex system and performance issues
- Experience with tools such as ArgoCD, Git Hub Actions, or Git Ops workflows
- Familiarity with large‑scale enterprise platforms and environments
- Experience in regulated industries such as healthcare or pharma
- Exposure to global support models and follow‑the‑sun operations
- Strong written communication skills, including crafting incident updates, postmortems, and status summaries for mixed audiences
This role is hybrid, in office 3 days a week, and does not require travel.
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities. If you require accommodation, you may request one via the following form:
Lilly is proud to be an EEO Employer and does not discriminate on the basis of age, race, color, religion, gender identity, sex, gender expression, sexual orientation, genetic information, ancestry, national origin, protected veteran status, disability, or any other legally protected status.
Actual compensation will depend on a candidate’s education, experience, skills, and geographic location. The anticipated wage for this position is $126,000 - $224,400.
Full‑time equivalent employees also will be eligible for a…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).