Lead Software Engineer- Resiliency Job Columbus area,Ohio USA,IT/Tech

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products.

As a Lead Site Reliability Engineer at JPMorgan Chase within the Employee Compute Branch Team you will play a pivotal role in designing, implementing, and overseeing automation for observability and notification across a diverse set of systems in a global Microsoft Windows environment. You will lead by example, bringing hands‑on expertise in Power Shell and C#, and infusing best practices into a team of highly experienced system engineers.

Your work will directly impact the reliability, scalability, and efficiency of our platforms, with a strong focus on cloud (Azure and AWS) integration.

Job Responsibilities

Champion site reliability engineering culture and practices, exerting technical influence across the team.
Lead the design and hands‑on implementation of automated observability and notification solutions using Power Shell and C#.
Drive initiatives to improve reliability and stability of applications and platforms through data‑driven analytics and automation.
Collaborate with team members to define and implement service level indicators, objectives, and error budgets.
Architect and implement monitoring, alerting, and telemetry solutions using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk.
Act as the primary technical lead during major incidents, quickly identifying and resolving issues to minimize impact.
Mentor and upskill system engineers, fostering a programming mindset and best practices in automation and reliability.
Facilitate cross‑team and cross‑region collaboration, ensuring alignment and knowledge sharing.
Document and share technical solutions and best practices within internal forums and communities of practice.
Engage with stakeholders to understand business needs and translate them into technical solutions, with increasing responsibility over time.
Break down complex problems into actionable work for the team, ensuring clear direction and accountability.

Required qualifications, capabilities, and skills

Formal training or certification on Site Reliability Engineering concepts and 5+ years applied experience
Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, and toil reduction, with proven ability to implement these practices.
Expert‑level fluency in Power Shell and C# in a Microsoft Windows environment.
Hands‑on experience with cloud platforms, specifically Azure and AWS.
Demonstrated experience in automated software testing (unit, integration, end‑to‑end).
Deep knowledge of software applications and technical processes, with emerging depth in one or more technical disciplines.
Proficiency and experience in observability, including white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk.
Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, Git Lab, Terraform).
Experience with containerization and container orchestration (e.g., Docker, Kubernetes, ECS).
Ability to mentor and teach programming concepts to system engineers with non‑programming backgrounds, fostering a programming mindset and best practices.
Excellent communication and strategic thinking skills, with the ability to collaborate across teams, regions, and stakeholder groups.

Preferred qualifications, capabilities, and skills

Experience leading teams or projects in a site reliability or automation‑focused role.
Experience in financial services or other highly regulated, secure enterprise environments.
Experience with containerization and orchestration (e.g., Docker, Kubernetes, ECS).
Familiarity with complex data structures and algorithms.
Drive to self‑educate and evaluate new technologies.
Ability to expand and collaborate across different levels and stakeholder groups.
Experience architecting self‑healing or remediation automation (a plus, but not required at this stage).

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language