×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Engineer - Inference

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Gravity Engineering Services Pvt Ltd.
Full Time position
Listed on 2026-06-17
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

About the Role

Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively  you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting-edge AI solutions.

Join us in shaping the future at Together AI!

Responsibilities
  • Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
  • Develop and optimize runtime inference services for large-scale AI applications.
  • Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
  • Conduct design and code reviews to ensure high standards of quality.
  • Create services, tools, and developer documentation to support the inference engine.
  • Implement robust and fault-tolerant systems for data ingestion and processing.
Requirements
  • 3+ years of experience writing high-performance, well-tested, production-quality code.
  • Proficiency with Python and Py Torch .
  • Demonstrated experience in building high performance libraries and tooling.
  • Excellent understanding of low-level operating systems concepts including multi-threading
    , memory management
    , networking
    , storage
    , performance
    , and scale
    .
  • Preferred:
    Knowledge of existing AI inference systems such as TGI
    , vLLM
    , TensorRT-LLM
    , Optimum
    .
  • Preferred:
    Knowledge of AI inference techniques such as speculative decoding
    .
  • Preferred:
    Knowledge of CUDA/Triton programming.
  • Nice to have:
    Knowledge of Rust
    , Cython and compilers
    .
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary