×
Register Here to Apply for Jobs or Post Jobs. X

Inference Architecture Interns

Job in San Jose, Santa Clara County, California, 95199, USA
Listing for: Etched
Apprenticeship/Internship position
Listed on 2026-06-17
Job specializations:
  • Software Development
    Python, AI Engineer (Applied/Software), Computer Software / Middleware, Backend Developer
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Inference Intern | Etched

The Tone:

This is an internship at Etched, located in San Jose, CA. Etched is building the world’s first AI inference system purpose‑built for transformers, aiming to deliver significantly higher performance and lower costs compared to existing solutions. This role is crucial for developing and optimizing compute architectures that achieve exceptional performance and efficiency for transformer workloads. Interns will contribute to the design of next‑generation AI accelerators, working on cutting‑edge architectural problems and performance modeling.

The

TL;

DR
  • Role:
    Internship
  • Type:
    Temporary
  • Location:

    In‑person, San Jose, CA
  • Mission:
    Develop and optimize compute architectures that deliver exceptional performance and efficiency for transformer workloads.
  • Tech Stack:
    Python, C++, Linux internals, accelerator architectures (GPUs, TPUs), Compilers, high‑speed interconnects (NVLink, Infini Band), vLLM, SGLang, Rust, PyTorch, JAX
What You’ll Actually Do
  • Model Porting:
    Support porting state‑of‑the‑art models to the architecture and help build programming abstractions and high‑performance software components for rapid iteration.
  • Runtime Development:
    Assist in building, enhancing, and scaling Sohu’s runtime, including multi‑node inference, intra‑node execution, state management, and robust error handling.
  • Communication Optimization:
    Contribute to optimizing routing and communication layers using Sohu’s collectives.
  • Performance Analysis:
    Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
  • Architecture Co‑design:
    Develop a deep understanding of Sohu to co‑design both hardware instructions and model architecture operations to maximize model performance.
The Must-Haves
  • Background:
    Student progressing towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, applied mathematics, or a related field.
  • Experience:

    Understanding of performance‑sensitive or complex distributed software systems, such as Linux internals, accelerator architectures (e.g., GPUs, TPUs), Compilers, or high‑speed interconnects (e.g., NVLink, Infini Band), coupled with experience porting applications to non‑standard accelerator hardware or platforms. Deep knowledge of transformer model architectures and/or inference serving stacks like vLLM or SGLang is also required.
  • Skills:

    Proficiency in Python and C++.
  • Bonus:
    Proficiency in Rust, experience with low‑latency and high‑performance applications using kernel‑level and user‑space networking stacks, a deep understanding of distributed systems concepts, solid grasp of Transformer architectures (especially Mixture‑of‑Experts), experience building applications with extensive SIMD optimizations, familiarity with PyTorch or JAX, or participation in math competitions.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary