More jobs:
Distinguished Software Engineer, Reliability Infra
Job in
Mountain View, Santa Clara County, California, 94039, USA
Listed on 2026-01-01
Listing for:
LinkedIn
Full Time
position Listed on 2026-01-01
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
• Full-time
• Workplace Type:
Hybrid
Linked In is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. Were also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture thats built on trust, care, inclusion, and fun where everyone can succeed.
At Linked In, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a Linked In office on select days, as determined by the business needs of the team.
This role will be based in Sunnyvale, CA or San Francisco, CA.
Responsibilities
• Serve as a senior technical leader driving the long-term reliability and observability strategy across Linked In's infrastructure
• Re-architect Linked In's backend systems to enable granular failure domains and reduce the blast radius of incidents
• Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers
• Partner closely with across many different types of engineers to raise the bar for operational excellence and incident response
• Define and build frameworks to improve monitoring, alerting, and observability across hundreds of services and systems
• Define and own the roadmap of bringing observability to critical user journeys for Linked In's products to help capture and improve the experience of Linked In's members/customers
• Spearhead a multi-year initiative to transition Linked In's infrastructure to a regionalized model with localized failover, enhancing both scalability and availability
• Lead technical discussions on the future of Engineering at Linked In, what the function should evolve into over the next 3- 5 years
• Deliver key insights, executive level reporting across the cross-functional engineering teams to enable the right business decisions around improving quality and reliability of our services and products
• Act as a force multiplier by mentoring engineers, influencing technical direction across orgs, and contributing deeply to culture, hiring, and technical excellence
• Lead incident response and post-incident reviews to identify root causes and implement preventive measures.
• Develop and maintain incident management processes and procedures to ensure timely resolution of issues and minimize impact on customers
Basic Qualifications
• 15+ years of software engineering experience
• 8+ years focused on infrastructure, reliability focused engineering, or distributed systems
Preferred Qualifications
• Hands-on experience with large-scale incident response, root cause analysis, and resiliency engineering
• Strong communication and cross-functional collaboration skills, with experience influencing across multiple orgs and leadership levels
• Proven success designing and leading architectural transformations at internet-scale companies
• Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design
• Experience with multi-region architecture, capacity planning, and failover strategies in large-scale cloud or hybrid environments
• Background in CI/CD, platform reliability, and automation of ops-heavy systems.
• Familiarity with modern observability stacks (e.g., Open Telemetry, Prometheus, Grafana) and service mesh architecture
• Track record of setting long-term technical strategy and driving systemic improvements in availability and performance
• Previous experience in a Distinguished Engineer or equivalent role at a high-growth or web-scale technology company
Suggested Skills
• Site Reliability Engineering (SRE)
• Leadership
Linked In is committed to fair and equitable compensation practices. The pay range for this role is $238,000 to $390,000. Actual compensation packages are based on several factors that are unique…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×