Software Engineer, AI Platform
Listed on 2026-01-06
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
Linked In is the worlds largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. Were also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture thats built on trust, care, inclusion, and fun where everyone can succeed.
Job DescriptionTeam Name: Feature Serving
Linked In is the world’s largest professional network, built to help members of all backgrounds and experiences achieve more in their careers. Our vision is to create economic opportunity for every member of the global workforce. Every day our members use our products to make connections, discover opportunities, build skills and gain insights. We believe amazing things happen when we work together in an environment where everyone feels a true sense of belonging, and that what matters most in a candidate is having the skills needed to succeed.
It inspires us to invest in our talent and support career growth. Join us to challenge yourself with work that matters.
At Linked In, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a Linked In office on select days, as determined by the business needs of the team.
Job Description
This role can be based in Mountain View, CA, San Francisco, CA, or Bellevue, WA.
Join us to push the boundaries of scaling large models together. The team is responsible for scaling Linked In’s AI model training, feature engineering and serving with hundreds of billions of parameters models and large scale feature engineering infra for all AI use cases from recommendation models, large language models, to computer vision models. We optimize performance across algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards.
The team also works closely with the open source community and has many open source committers (Tensor Flow, Horovod, Ray, vLLM, Hugginface, Deep Speed etc.) in the team. Additionally, this team focussed on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across billions of user queries
Model Training
Infrastructure: As an engineer on the AI Training Infra team, you will play a crucial role in building the next-gen training infrastructure to power AI use cases. You will design and implement high performance data I/O, work with open source teams to identify and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, enable distributed training over 100s of billions of parameter models, debug and optimize deep learning training, and provide advanced support for internal AI teams in areas like model parallelism, tensor parallelism, Zero++ etc.
Finally, you will assist in and guide the development of containerized pipeline orchestration infrastructure, including developing and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep learning frameworks and their companion libraries like Tensorflow, PyTorch, Deep Speed, GNNs, Flash Attention. PyTorch Lightning and more and more.
Feature Engineering: this team shapes the future of AI with the state-of-the-art Feature Platform, which empowers AI Users to effortlessly create, compute, store, consume, monitor, and govern features within online, offline, and nearline environments, optimizing the process for model training and serving. As an engineer in the team, you will explore and innovate within the online, offline, and nearline spaces at scale (millions of QPS, multi terabytes of data, etc), developing and refining the infrastructure necessary to transform raw data into valuable feature insights.
Utilizing leading open-source technologies like Spark, Beam, and Flink…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).