×
Register Here to Apply for Jobs or Post Jobs. X

Lead Software Engineer- Resiliency

Job in Columbus, Franklin County, Ohio, 43224, USA
Listing for: J.P. Morgan
Full Time position
Listed on 2026-01-02
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Job Description & How to Apply Below

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Employee Compute Branch Team you will play a pivotal role in designing, implementing, and overseeing automation for observability and notification across a diverse set of systems in a global Microsoft Windows environment. You will lead by example, bringing hands‑on expertise in Power Shell and C#, and infusing best practices into a team of highly experienced system engineers.

Your work will directly impact the reliability, scalability, and efficiency of our platforms, with a strong focus on cloud (Azure and AWS) integration.

Job Responsibilities
  • Champion site reliability engineering culture and practices, exerting technical influence across the team.
  • Lead the design and hands‑on implementation of automated observability and notification solutions using Power Shell and C#.
  • Drive initiatives to improve reliability and stability of applications and platforms through data‑driven analytics and automation.
  • Collaborate with team members to define and implement service level indicators, objectives, and error budgets.
  • Architect and implement monitoring, alerting, and telemetry solutions using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk.
  • Act as the primary technical lead during major incidents, quickly identifying and resolving issues to minimize impact.
  • Mentor and upskill system engineers, fostering a programming mindset and best practices in automation and reliability.
  • Facilitate cross‑team and cross‑region collaboration, ensuring alignment and knowledge sharing.
  • Document and share technical solutions and best practices within internal forums and communities of practice.
  • Engage with stakeholders to understand business needs and translate them into technical solutions, with increasing responsibility over time.
  • Break down complex problems into actionable work for the team, ensuring clear direction and accountability.
Required qualifications, capabilities, and skills
  • Formal training or certification on Site Reliability Engineering concepts and 5+ years applied experience
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, and toil reduction, with proven ability to implement these practices.
  • Expert‑level fluency in Power Shell and C# in a Microsoft Windows environment.
  • Hands‑on experience with cloud platforms, specifically Azure and AWS.
  • Demonstrated experience in automated software testing (unit, integration, end‑to‑end).
  • Deep knowledge of software applications and technical processes, with emerging depth in one or more technical disciplines.
  • Proficiency and experience in observability, including white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk.
  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform).
  • Experience with containerization and container orchestration (e.g., Docker, Kubernetes, ECS).
  • Ability to mentor and teach programming concepts to system engineers with non‑programming backgrounds, fostering a programming mindset and best practices.
  • Excellent communication and strategic thinking skills, with the ability to collaborate across teams, regions, and stakeholder groups.
Preferred qualifications, capabilities, and skills
  • Experience leading teams or projects in a site reliability or automation‑focused role.
  • Experience in financial services or other highly regulated, secure enterprise environments.
  • Experience with containerization and orchestration (e.g., Docker, Kubernetes, ECS).
  • Familiarity with complex data structures and algorithms.
  • Drive to self‑educate and evaluate new technologies.
  • Ability to expand and collaborate across different levels and stakeholder groups.
  • Experience architecting self‑healing or remediation automation (a plus, but not required at this stage).
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary