More jobs:
Job Description & How to Apply Below
Compiler Optimization Engineer / Remote or on-site / Well-funded startup
Rare opportunity to join a well-funded start-up building a hardware-agnostic AI compiler that allows teams to deploy to any accelerator architecture from a single codebase.
We are looking for a core engineer to join the team behind our graph optimization layer. In this role, you will have a direct hand in shaping how the next generation of AI models scale across diverse hardware.
About the role:
- You'll design, implement, and maintain graph-level optimisation passes including operator fusion, layout propagation, tiling, dead code elimination, and constant folding
- You'll get the chance to define and evolve the intermediate representation (IR) to support new optimisation opportunities as ML model architectures advance
- You'll analyse real performance data to identify gaps and drive measurable improvements in throughput and latency
- You'll get the chance to build and contribute to testing and validation infrastructure to ensure correctness across optimisation passes
- You'll collaborate closely with frontend and code generation teams to maintain clean IR interfaces and well-structured pipelines
- You'll get the chance to propose and prototype new optimisation strategies in response to advances in model design and hardware capabilities
Key Requirements:
- You'll have a degree in CS or Computer Engineering (BS, MS, or PhD)
- You'll bring strong C/C++ experience across performance-critical codebases
- You'll have deep understanding of graph-level compiler optimisation — fusion, tiling, layout transformations, DCE
- You'll be able to speak concretely about how your work translated into measurable performance improvements
It's a big plus if:
- You've worked with MLIR, XLA, or similar graph-level IR frameworks
- You have familiarity with ML framework internals — PyTorch eager/compile mode, JAX/XLA, or TensorRT
- You've explored polyhedral models or affine analysis for loop and tensor optimisation
- You have an understanding of hardware memory hierarchies and how layout decisions affect GPU/accelerator performance
- You've worked with quantisation, sparsity, or model-level optimisation techniques
- You've contributed to open-source compiler or ML infrastructure projects
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×