Senior Lead Software Engineer
Listed on 2026-06-29
-
Software Development
Cloud Engineer - Software, DevOps, Backend Developer
If you're a Senior Lead Software Engineer who takes ownership of outcomes in production — not just implementation — and thrives on turning ambiguous requirements into stable, well-modeled service designs, this role was built for you. You will have meaningful latitude to influence architecture, engineering standards, and reliability posture across services, with expectations and recognition aligned to senior-level impact.
As a Senior Lead Software Engineer at JPMorgan
Chase within the Corporate AI/ML Data Platforms – Machine Learning Center of Excellence, you will design, build, and optimize high-performance, low-latency distributed systems that serve as the backbone of our machine learning and data infrastructure. You will collaborate across engineering, data science, and platform teams to deliver resilient, cloud-native solutions that enable the firm to operate at the forefront of AI-driven innovation.
Your work will directly shape the reliability, scalability, and performance of systems that process critical data across the enterprise, and your voice will carry weight in the architectural and engineering decisions that define how the platform evolves.
- Architect and implement low-latency, high-throughput Java Spring Boot based distributed services using object-oriented principles, meeting the performance demands of production-grade services with well-defined APIs.
- Design and build resilient, cloud-native service architectures with high-availability requirements (3 to 5 nines), leveraging AWS compute, messaging, streaming, DB, and storage services such as MSK (Kafka), SQS, S3, ECS, EKS, Lambda, KVS/KDS, RDS, Dynamo, Redshift, and S3.
- Develop and maintain infrastructure-as-code solutions using Terraform and/or Cloud Formation to support scalable, repeatable, and auditable cloud deployments.
- Implement and continuously improve observability solutions — including alerting, monitoring, and reporting — using Datadog, Dynatrace, and Splunk to deliver actionable production intelligence across microservices platforms.
- Translate ambiguous or evolving requirements into stable, well-modeled service designs, clearly articulating engineering tradeoffs to both technical and non-technical stakeholders.
- Lead technical design reviews, establish engineering best practices, and drive adoption of standards that improve platform operability, reliability, and maintainability.
- Own production outcomes end-to-end — identifying and resolving performance bottlenecks, reliability gaps, and scalability constraints through automation and runbook-driven operations.
- Partner with machine learning engineers and data scientists to understand platform requirements and deliver robust, production-ready engineering solutions.
- Mentor and provide technical guidance to engineers across the team, fostering a culture of ownership, continuous learning, and engineering excellence.
- Drive adoption and governance of approved AI-assisted engineering practices across teams to improve code quality, delivery speed, and operational outcomes (e.g., AI-assisted code review/refactoring, test acceleration, release readiness, incident/root-cause analysis), establishing measurable validation standards and promoting reuse of proven patterns within the SDLC/TLM toolchain.
- Apply knowledge of tools within the Software Development Life Cycle toolchain, including approved AI-assisted development and automation capabilities, to improve the value realized by automation at scale.
- Formal training or certification on software engineering concepts and 5+ years of applied experience; very strong Java development skills using object-oriented principles, with strong experience using Spring Boot.
- Demonstrated experience designing and tuning for low-latency processing in production distributed systems.
- Hands‑on experience leveraging AWS compute, messaging, streaming, DB, and storage services such as MSK (Kafka), SQS, S3, ECS, EKS, Lambda, KVS/KDS, RDS, Dynamo, and Redshift in large-scale, resilient service architectures.
- Practical experience implementing alerting, monitoring, and reporting solutions using Datadog,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).