×
Register Here to Apply for Jobs or Post Jobs. X

Principal Software Engineer, AI Training Platform

Remote / Online - Candidates ideally in
Mountain View, Santa Clara County, California, 94039, USA
Listing for: LinkedIn
Full Time, Apprenticeship/Internship, Remote/Work from Home position
Listed on 2026-01-01
Job specializations:
  • Software Development
    AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Principal Staff Software Engineer, AI Training Platform

• Full-time

• Workplace Type:
Hybrid

Linked In is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We’re also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that's built on trust, care, inclusion, and fun where everyone can succeed.

This role will be based in Mountain View, CA.

At Linked In, we trust each other to do our best work where it works best for us and our teams.

This role offers hybrid work options, meaning you can work from home and commute to a Linked In office, depending on what’s best for you and when your team needs to be together.

As part of Linked In's AI Platform group, the AI Training team is responsible for developing and maintaining highly available and scalable deep learning training solutions to power our rapidly growing AI use cases. The team is responsible for scaling Linked In's AI model training with hundreds of billions of parameters for all AI use cases from recommendation models, large language models (Generative AI), to computer vision models.

We optimize training performance across algorithms, AI frameworks, infrastructure software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (Tensor Flow, Horovod, Ray, Hadoop, etc.) in the team. Additionally, this team focuses on technologies like LLMs, GNNs, Incremental Learning, Online Learning, and advanced LLM Agents work for Training infrastructure.

As a Principal Staff Software Engineer on the AI Training Infra team, you will play a crucial role in leading and building the next‑gen training infrastructure to power AI use cases. You will design and implement high performance AI Training pipeline, data I/O, work with open source teams to identify and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, debug and optimize deep learning training, and provide advanced support for internal AI teams in areas like model parallelism, data parallelism, Zero, automatic mixed precision and kernel fusion.

Finally, you will assist in and guide the development of containerized pipeline orchestration infrastructure, including developing and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep learning frameworks and their companion libraries like Tensorflow, PyTorch, Deep Speed, GNNs, Flash Attention and more.

Responsibilities

• Owning the technical strategy for broad or complex requirements with insightful and forward‑looking approaches that go beyond the direct team and solve large open‑ended problems.

• Designing, implementing, and optimizing the performance of large‑scale distributed training for personalized recommendation as well as large language models.

• Improving the observability and understandability of various systems with a focus on improving developer productivity and system sustenance.

• Mentoring other engineers, defining our challenging technical culture, and helping to build a fast‑growing team.

• Working closely with the open‑source community to participate and influence cutting edge open‑source projects (e.g., PyTorch, GNNs, Deep Speed, Huggingface, etc.).

• Functioning as the tech‑lead for several concurrent key initiatives for the Training Infrastructure and defining the future of AI training platforms.

Basic Qualifications

• BS/BA in Computer Science or related technical field or equivalent technical experience

• 7+ years of industry experience in software design, development, and algorithm related solutions

• 7+ years of experience programming in object‑oriented languages such as Python, C++, Java, Go, Rust, Scala

• 5+ years of experience as an architect, or technical leadership position

• 5+ years of experience in the…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary