Senior AI Researcher Job Boston area,Massachusetts USA,Research/Development

Assail AI Engineering Researcher

Assail builds autonomous offensive security. Our platform, Ares, finds vulnerabilities in production systems by reasoning about them the way an experienced attacker would – chaining flaws across APIs, web applications, and mobile surfaces to surface the exploits that scanners miss and human testers run out of time to find. We train our own models. Dagger is our 14B-parameter offensive security model, fine‑tuned for vulnerability discovery and exploit reasoning.

Javelin is our co‑evolutionary self‑training architecture, where attacker and defender models train against each other to push capability further than either could reach alone. The research surface is wide open, the domain is consequential, and the work ships into a platform that is actively used against hardened enterprise targets.

The Role

We're hiring our first dedicated AI Researcher to advance the core models powering Ares. You'll work alongside our VP of AI Engineering and a small AI engineering team, with direct collaboration with our CEO – a researcher and practitioner with 26 years of offensive security experience. This is a research role, not an applied ML role. You'll own original research on offensive security agents – how they reason, plan, use tools, and operate autonomously over long horizons.

You'll design experiments end‑to‑end, build the evaluation infrastructure the field doesn't yet have, and translate research wins into capability that ships.

What You'll Do

Drive original research on offensive security agents – reasoning, planning, tool use, and autonomous long‑horizon operation
Advance Dagger's post‑training pipeline: supervised fine‑tuning, RL from verifier signals, LoRA adaptation, and evaluation against adversarial benchmarks
Extend Javelin's co‑evolutionary self‑training architecture: curriculum design, self‑play dynamics, and reward modeling for security‑specific outcomes
Design and execute experiments end‑to‑end, from hypothesis through writeup
Build internal evaluation harnesses that measure capability rigorously, where no public benchmark exists
Translate research into production handoffs to AI Engineering – model cards, deployment notes, and known failure modes
Contribute to Assail's external research voice through papers, talks, responsible disclosures, and technical writing
Collaborate with engineering teammates on research methodology and experimental design

What We're Looking For

Core experience that matters most

Original ML research output – published papers, widely cited preprints, significant open‑source releases, or shipped research that materially advanced a production system
Hands‑on post‑training experience with language models at the 7B+ parameter scale, end‑to‑end ownership of a pipeline including data, training, and evaluation
Direct work with at least one of: RL from verifier or reward signals, preference optimization (DPO/IPO/KTO), or supervised fine‑tuning with synthetic data pipelines
Experience with agentic LLM systems – tool use, multi‑step reasoning, planning, or long‑horizon execution
Ability to design evaluation that measures real capability and avoids contamination or specification gaming
Strong Python and PyTorch, with experience in distributed training at multi‑GPU scale
Clear technical writing – research memos, experiment writeups, papers, or equivalent

Helpful but Learnable Here

Working knowledge of offensive security fundamentals (we'll teach you the rest if you bring strong ML depth)
Prior work on code‑generating or code‑reasoning models
Experience with sparse, delayed, or expensive reward signals in RL
Research on robustness, adversarial ML, or red‑teaming of language models
Familiarity with long‑horizon agent benchmarks (SWE‑bench, Cybench, Web Arena, or similar)

Things We Deliberately Don't Require

A PhD. Track record matters more than the credential. If your work demonstrates the capability, the degree is secondary.
A security background. Strong ML researchers can develop security depth here, and we'll support you in doing it.
A specific number of years. Senior is a function of judgment and output, not a count.

What This Role Will Teach You

How to train and…