Silicon Architect,Diffusion ASICs Job New York New York USA,IT/Tech

Location: New York

Normal Computing | Incredible Opportunities

The Normal Team builds foundational software and hardware that help move technology forward, supporting the semiconductor industry, critical AI infrastructure, and the broader systems that power our world. We work as one team across New York, San Francisco, Copenhagen, Seoul, and London.

Your Role in Our Mission

Look at the AI accelerator roadmaps coming out of every major silicon company right now and you will notice something strange: they are all building the same chip.

Bigger systolic arrays. More HBM. More of the same architecture, scaled harder. The industry has placed a collective bet that the way to win the next decade of AI inference is to refine the GPU paradigm until it cannot be refined any further.

We know that bet is wrong.

Normal is building ASICs purpose‑built for image and video diffusion inference, grounded in the physics of computation rather than the assumptions everyone else has inherited. The compute substrate has to be invented, not specified, and we are looking for the person who wants to help invent it.

You will work directly alongside our lead architect and research engineers, contributing across the full architecture stack: compute core microarchitecture, memory subsystem, interconnect, and the FPGA prototyping that proves the decisions before silicon. The team is small. The scope is wide. The architecture is being shaped now, not refined, and your contributions will be visible in the chip when it tapes out.

If the appeal of working on a chip that has to be invented is greater to you than iterating on one that already exists, keep reading.

Responsibilities

Help define the architecture and microarchitecture of novel AI accelerator compute blocks. PE array design, datapath organization, and support for efficiency techniques such as sparsity exploitation and reduced‑precision computation. The compute tile is the surface where Normal's research advantages have to show up in silicon, and you are one of the people responsible for making sure they do.
Translate workload analysis and research findings into hardware specifications. Identify where architectural innovation creates the most leverage, define the structures that realize it, and produce microarchitecture documents unambiguous enough for RTL engineers to implement against. You work closely with them through implementation, not over the wall from it.
Reason across the full stack and defend PPA tradeoffs at every level. Move between algorithm‑level workload behavior, memory hierarchy, on‑chip interconnect, and physical design constraints. Make the call when the data is incomplete, and articulate why under scrutiny from the lead architect and the research team.
Partner with the compiler lead on ISA co‑design. The compute tile must be compilable and programmable, not just simulatable. The programming model and the microarchitecture are defined together, and you are accountable for both sides meeting in the middle.
Own the FPGA prototyping work. Scope what the FPGA implementation actually proves, drive the implementation through to bring‑up, and use it to de‑risk architecture decisions before tapeout. You decide which questions are worth answering in FPGA versus cycle‑accurate simulation.
Stay current with the AI accelerator research landscape and be able to articulate clearly where Normal's approach differs from existing solutions and why that matters. This is a research‑adjacent seat and you are expected to read, not just consume.

What We're Looking For

A degree in Electrical Engineering, Computer Engineering, Computer Science, or equivalent work experience. PhD welcome but not required; the bar is the work, not the credential.
Substantial experience in architecture or microarchitecture of high‑performance digital systems. AI accelerators, compute engines, or similarly complex logic. You have shaped the structures inside a chip, not just consumed them from the outside.
Fluency moving between algorithm‑level analysis and hardware specification. You can read a profile of a workload and translate it into datapath widths, pipeline stages, and area/power estimates without losing the thread on…

Silicon Architect, Diffusion ASICs