Principal Model Researcher Job Austin area,Texas USA,IT/Tech

Grow with us

Principal AI Model Architect - Silicon-Software Co-Design

Austin, Texas

The Voice of the Model. The Architect of the Machine.

The Mission

Most AI architects optimize models for hardware that already exists. You're going to shape the hardware around the model - before a single transistor is placed.

As our Principal AI Model Architect, you occupy the most strategically critical seat in our entire silicon program. You are the living contract between what our researchers dream up and what our silicon team can physically build. You are the person in the room who looks at a state-of-the-art Transformer architecture and answers the question no one else can:

"Here's exactly how we break this apart, map it across our heterogeneous ASIC, and run it faster than anyone else on earth - and here's the proof."

This isn't model fine-tuning. This isn't prompt engineering. This is deep, architecture-level surgery - partitioning massive parameter models, defining tiling strategies, projecting cycle-accurate performance on silicon that doesn't exist yet, and ensuring the SDK team has a mathematically airtight path to make it all real.

Your decisions don't just influence software. They get etched into silicon.

What You'll Actually Be Doing

The Hardware-Software Bridge - Own the Translation Layer

You'll take bleeding-edge AI and RAN algorithms - Transformers, Grouped Query Attention, Rotary Positional Embeddings - and convert them into precise hardware specifications for the ASIC team and concrete lowering requirements for the SDK team. You're not summarizing research. You're operationalizing it, making it real at the level of memory hierarchies, dataflow patterns, and execution units.

Model Partitioning & Tiling - Shatter the Model

You'll define the strategies for how massive, multi-hundred-million parameter models get decomposed and mapped across heterogeneous compute fabrics. Tensor parallelism, pipeline stages, tiling across HBM and on-chip SRAM - you'll architect the playbook that determines how every layer lives and breathes on custom silicon.

Golden Model Ownership - Guard the Source of Truth

You'll own and maintain the canonical reference implementations in JAX and PyTorch - the undisputed "Source of Truth" that the entire program aligns to. When the MLIR-compiled output lands on silicon, it's your models that prove whether the math held. You'll work hand-in-hand with the SDK team to ensure that what the researcher intended and what the hardware executes are identical, bit for bit.

Performance Projection - See the Future in Cycles

Before a single line of RTL is written, you'll be projecting performance. Using cycle-accurate simulators, System

C models, and your own deep intuition for how model architectures behave under hardware constraints, you'll give the silicon team the confidence to make tape-out decisions that cost millions of dollars. You are the signal in the noise.

Join our Team

What You Bring

Model Architecture Mastery

You have deep, battle-tested knowledge of Transformer architectures - not just how they work conceptually, but how every design choice (attention head count, KV-cache sizing, embedding strategies, GQA vs. MQA trade-offs) ripples through a hardware execution profile.

Hardware-Aware ML - The Rare Skill

You've lived in the "Hardware-in-the-Loop" world. You think about cache line behavior, memory wall bottlenecks between HBM and SRAM, and how SIMD and VLIW execution units reward or punish specific model shapes. You don't just write models - you profile them against physics.

Framework Depth - Down to the Graph

Advanced proficiency in JAX (strongly preferred), PyTorch, or Tensor Flow - specifically at the export and compilation layer. You're comfortable with graph capture, XLA compilation, and Stable

HLO representations. You know what happens to your model after the Python interpreter is done with it.

Performance Modeling

Experience with System

C, Transaction-Level Modeling, or custom cycle-accurate simulation frameworks. You've used these tools to validate architectural decisions before silicon is committed - and you've been right when it counted.

Preferred Expertise - The Gap-Fillers That Set You Apart

Telecommunications DNA

You've applied AI to real RAN workloads - channel estimation, beamforming, interference management at L1/L2/L3. You understand why 5G inference isn't just a data center problem in a smaller box.

Compiler Curiosity

You don't need to write MLIR transformation passes from scratch - but you instinctively understand the journey from a high-level compute graph to a linearized, scheduled, hardware-bound execution sequence. You know what gets lost in that translation and how to protect against it.

Numeric Sensitivity

You have hands-on experience with complex-valued AI models and the specific challenges they create when mapping to DSP and matrix accelerator hardware. Fixed-point quantization, dynamic range, numerical stability under precision reduction - these aren't abstract concerns to you. They're design…