We’re a fast-paced, fabless semiconductor startup redefining the boundaries of AI through cutting-edge, scalable AI-infused multipurpose compute architecture. Our mission is to deliver scalable, efficient, and intelligent silicon solutions for the next generation of edge AI, robotics, autonomous systems, and mobile devices. Our leadership team brings together decades of experience in semiconductor innovation, spanning chip architecture, system design, and global business operations.
The team includes pioneers behind several generations of groundbreaking compute architectures, experts in software-hardware co-design, SoC and AI development with hundreds of patents in our portfolio as well as leaders of multi-billion-dollar business units at top-tier technology companies.
This is a great opportunity to join a highly-skilled AI/ML Software team working at the intersection of HW/SW co-design. In this role, you will be responsible for designing and executing end-to-end model compression pipelines, including sensitivity analysis, quantization, pruning, and hybrid optimization techniques across large-scale transformer architectures.
Key Responsibilities and DutiesBuild and own the end-to-end compression pipeline
- Baseline benchmarking and instrumentation
- Sensitivity analysis
Implement layerwise sensitivity scoring frameworks
Design and apply quantization strategies
- INT8, INT4, FP8, FP4 exploration
- Per-layer/tensor precision assignment
- Dynamic range calibration and scaling strategies
Implement and evaluate pruning techniques
Apply hybrid compression methods
- QAT, LoRA-based recovery, distillation
- Latency / throughput
- Memory footprint
Optimize for iMachine Architecture
Qualifications and SkillsSuccessful candidates should possess the following qualifications and skills:
Required Qualifications (You must possess these qualifications to be considered for the position)Bachelor of Science Degree in Electrical Engineering, Computer Science, Computer Engineering, or related field
1+ year of experience with PyTorch / JAX / Tensor Flow
Understanding of:
- Numerical precision and quantization theory
Hands-on experience with:
- Tensor
RT, ONNX Runtime, or similar inference stacks
Familiarity with:
- Sparse representations (CSR, COO, RLC )
- Low-rank approximation methods (SVD, factorization)
Ability to analyze:
- Numerical stability issue
MS or PhD in Electrical Engineering, Computer Engineering, Computer Science, or related field
Experience with:
- Hardware-aware optimization
Knowledge of:
- Deliver production-ready compressed models with minimal accuracy loss
- Achieve quantifiable performance gains (latency, memory, throughput)
- Build reusable tooling and automation pipelines
- Get in early at a breakthrough deep-tech startup reshaping AI compute
- Work closely with industry innovators and experienced leaders where your work will have a direct impact on the success of the company
- Be part of a mission-driven team building foundational technology for the future
- We balance sharp execution with continuous innovation to push the boundaries
- Competitive compensation, equity, and growth opportunities
At I Machines, Inc., we offer competitive salaries and a comprehensive benefits package, including:
- Health, dental, and vision insurance
- Retirement savings plans
- Paid time off and holidays
- Flexible Schedule
I Machines, Inc., is an equal opportunity employer and does not discriminate based on race, color, religion, gender, national origin, age, disability, or any other legally protected status. All qualified applicants will be considered for employment.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: