Senior Machine Learning Engineer Job San Jose area,California USA,Software Development

Responsibilities

Develop, optimize, and deploy lightweight machine learning models for edge AI applications, particularly for audio processing.
Implement and optimize ML models on embedded platforms, including FPGA and custom ASIC solutions.
Work closely with hardware and software teams to integrate ML models into production systems.
Research and implement state‑of‑the‑art ML techniques to enhance model efficiency, latency, and power consumption for embedded AI applications.
Improve inference efficiency and model compression techniques, including quantization, pruning, and knowledge distillation.
Collaborate with cross‑functional teams to drive innovation and contribute to the overall system architecture.
Provide technical leadership and mentorship to junior engineers.
Publish research findings, present at conferences, and contribute to open‑source projects when applicable.

Requirements

5+ years of relevant industry experience (or a PhD) in Computer Science, Electrical Engineering, Machine Learning, or related fields.
Must have prior experience managing a team, serving in a Team Lead role, or demonstrating strong technical leadership and cross‑functional coordination capabilities.
Strong hands‑on experience in machine learning, with a focus on edge AI, on‑device inference, and deploying lightweight models on resource‑constrained devices.
Expertise in modern ML frameworks such as PyTorch, Tensor Flow (including Tensor Flow Lite), and JAX.
Proficiency in Python and C/C++, with practical experience in ML model optimization and production deployment.
Deep experience with model quantization (PTQ/QAT), pruning, knowledge distillation, sparsity, and other compression techniques for efficient edge inference.
Hands‑on experience developing for or integrating with AI chip SDKs, neural accelerators (NPUs/DSPs), or hardware‑specific tool chains (e.g., NVIDIA TensorRT, Qualcomm Neural Processing SDK, ARM Ethos, or similar).
Familiarity with edge inference runtimes (ONNX Runtime, Execu Torch, TVM) and optimizing models for hardware constraints (latency, memory footprint, power consumption).

Additional Experience (Strong Plus)

Understanding of ML compiler and runtime design.
Experience working with tools such as Optimum, ONNX, TensorRT, TFLite/LiteRT, ncnn, or CoreML.
Familiarity with hardware acceleration techniques.
Experience in embedded system development.

Benefits

Salary Range: $200,000 - $280,000 / year

Tetra Mem celebrates diversity and is committed to creating an inclusive environment for all employees. We are proud to be an Equal Opportunity Employer and welcome applicants from all backgrounds. Qualified candidates will receive consideration for employment without regard to race, color, religion, creed, sex, gender identity or expression, sexual orientation, national origin, ancestry, age, marital status, medical condition, disability, genetic information, military or veteran status, or any other characteristic protected by applicable federal, state, or local law.

Tetra Mem is committed to providing reasonable accommodations to qualified applicants with disabilities throughout the recruitment process. Applicants requiring accommodation may contact Human Resources for assistance.

#J-18808-Ljbffr