Lead Software Engineer-AI Platform Engineer
Listed on 2026-05-18
-
Software Development
Software Engineer, AI Engineer, Cloud Engineer - Software, DevOps
Lead Software Engineer-AI Platform Engineer
Jersey City, NJ, United States
Job Identification:
Job Category: Software Engineering
Business Unit: Corporate Sector
Posting Date: 15/09/2025, 22:16
Location: 575 Washington Blvd, Jersey City, NJ, 07310, US
Job Schedule: Full time
Base Pay/Salary: Jersey City, NJ $-$
Job DescriptionWe have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible. As a Lead Software Engineer at JPMorgan Chase within the Corporate Sector, specifically as a part of the Infrastructure Platforms team, you will play a crucial role in an agile team committed to enhancing, creating, and delivering high-quality technology products in a secure, stable, and scalable manner.
Your role as a vital technical contributor will involve developing critical technology solutions across numerous technical domains within various business functions, all aimed at supporting the firm's business goals.
- Execute creative software solutions, including design, development, and technical troubleshooting, with the ability to think beyond conventional approaches to build solutions or resolve technical problems.
- Develop secure, high-quality production code, and review and debug code written by others.
- Identify opportunities to eliminate or automate the remediation of recurring issues to enhance the overall operational stability of software applications and systems.
- Lead evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented assessments of architectural designs, technical credentials, and their applicability within existing systems and information architecture.
- Lead communities of practice across Software Engineering to promote awareness and adoption of new and leading-edge technologies.
- Contribute to a team culture of diversity, equity, inclusion, and respect.
- Develop and deploy cloud infrastructure platforms that are secure, scalable, and optimized for AI and machine learning workloads.
- Collaborate with AI teams to understand computational needs and translate these into infrastructure requirements.
- Monitor, manage, and optimize cloud resources to maximize performance and minimize costs.
- Design and implement continuous integration and delivery pipelines for machine learning workloads.
- Develop automation scripts and infrastructure as code to streamline deployment and management tasks.
- Formal training or certification in software engineering concepts with 5+ years of applied experience.
- Hands‑on practical experience in delivering system design, application development, testing, and ensuring operational stability.
- Advanced proficiency in one or more programming languages such as Python and/or Golang.
- Proficiency in automation and continuous delivery methods.
- Proficient in all aspects of the Software Development Life Cycle.
- Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.).
- Proficiency in Linux environments, including scripting and administration.
- Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
- Experience in solutions design and engineering, containerization (Docker, Kubernetes), and cloud service providers (AWS, Azure, GCP).
- Experience with Infrastructure as Code (Terraform, Cloud Formation) and automation tools (Ansible, Chef, Puppet).
- Deep understanding of cloud component architecture:
Microservices, Containers, IaaS, Storage, Security, and routing/switching technologies.
- Foundational understanding of NVIDIA GPU Infrastructure software (e.g., NVIDIA DCGM, BCM, Triton Inference).
- Hands‑on experience with ML frameworks such as PyTorch, Tensor Board.
- Experience with observability tools like Prometheus, Grafana.
- Experience in ML Ops and associated tooling like MLflow.
- Experience with High Performance Computing and Machine Learning frameworks such as vLLM, Ray.io, Slurm.
- Strong background in network architecture, database…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).