ML Systems Engineer; Compiler & Graph Optimization Job Kingston area,Ontario Canada,Engineering

Position: ML Systems Engineer (Compiler & Graph Optimization)

Compiler Optimization Engineer / Remote or on-site / Well-funded startup

Rare opportunity to join a well-funded start-up building a hardware-agnostic AI compiler that allows teams to deploy to any accelerator architecture from a single codebase.

We are looking for a core engineer to join the team behind our graph optimization layer. In this role, you will have a direct hand in shaping how the next generation of AI models scale across diverse hardware.

About the role:

You'll design, implement, and maintain graph-level optimisation passes including operator fusion, layout propagation, tiling, dead code elimination, and constant folding
You'll get the chance to define and evolve the intermediate representation (IR) to support new optimisation opportunities as ML model architectures advance
You'll analyse real performance data to identify gaps and drive measurable improvements in throughput and latency
You'll get the chance to build and contribute to testing and validation infrastructure to ensure correctness across optimisation passes
You'll collaborate closely with frontend and code generation teams to maintain clean IR interfaces and well-structured pipelines
You'll get the chance to propose and prototype new optimisation strategies in response to advances in model design and hardware capabilities

Key Requirements:

You'll have a degree in CS or Computer Engineering (BS, MS, or PhD)
You'll bring strong C/C++ experience across performance-critical codebases
You'll have deep understanding of graph-level compiler optimisation — fusion, tiling, layout transformations, DCE
You'll be able to speak concretely about how your work translated into measurable performance improvements

It's a big plus if:

You've worked with MLIR, XLA, or similar graph-level IR frameworks
You have familiarity with ML framework internals — PyTorch eager/compile mode, JAX/XLA, or TensorRT
You've explored polyhedral models or affine analysis for loop and tensor optimisation
You have an understanding of hardware memory hierarchies and how layout decisions affect GPU/accelerator performance
You've worked with quantisation, sparsity, or model-level optimisation techniques
You've contributed to open-source compiler or ML infrastructure projects