Research Engineer - LLM/VLM Inference Optimization; Seed Infra Job Seattle area,Washington USA,IT/Tech

Position: Research Engineer - LLM/VLM Inference Optimization (Seed Infra)
About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. Responsibilities
1. Design, develop, and optimize high-performance inference systems for large-scale LLMs and VLMs, covering inference engines, serving frameworks, and end-to-end deployment pipelines.
2. Build state-of-the-art model inference engines through advanced performance optimization techniques such as compiler-level optimizations, parallel computing, graph fusion, efficient CUDA kernel development, low-precision computation, streaming inference, speculative decoding, and high-concurrency request optimization.
3. Collaborate closely with other research teams to identify performance bottlenecks, conduct in-depth performance analysis, and optimize large models; contribute to the development of model tool chains and the broader technical ecosystem.

Minimum Qualifications:

1. Bachelor's degree or above in Computer Science, Electrical Engineering, Software Engineering, or a related field.
2. Strong proficiency in C/C++ and Python; solid foundations in algorithms, data structures, and systems programming; familiarity with containerization and server-side debugging.
3. Hands-on experience with at least one mainstream machine learning framework (e.g., PyTorch, Tensor Flow). 4. Experience deploying or optimizing LLM/VLM inference at production scale, with demonstrated impact on latency, throughput, or serving cost.
5. Familiarity with GPU architecture and experience optimizing compute-intensive operators (e.g., Flash Attention, GEMM, GEMV, Conv2D).

Preferred Qualifications:

1.

Experience with large-scale LLM serving infrastructure or equivalent production LLM deployment experience.
2. Experience in GPU programming (CUDA/OpenCL) and familiarity with frameworks such as Tensor

RT, Triton, or CUTLASS.
3. Experience in performance modeling, profiling, and optimization, or strong knowledge of CPU/GPU architectures.
4. Familiarity with model/data parallelism frameworks for distributed inference.

Research Engineer - LLM​/VLM Inference Optimization; Seed Infra

Research Engineer - LLM/VLM Inference Optimization; Seed Infra