Research Engineer, Machine Learning
Listed on 2025-12-02
-
IT/Tech
Data Scientist, AI Engineer, Systems Engineer
Location: Remote or New York City, US
Organization: Poseidon Research
Compensation: $100,000–$150,000 annually; or higher, depending on experience
Type: One year contract
This position is funded through a charitable research grant.
Poseidon Research is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems transparent, trustworthy, and governable through deep technical research in interpretability, control, and secure monitoring.
We investigate how models think, hide, and reason
—from understanding encoded reasoning and steganography in reasoning models to building open‑source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.
You could be a cog in a big lab and gamble with humanity’s future. Or you could own your entire research platform at Poseidon Research, pioneering the infrastructure needed to accelerate AI safety to build a safe, secure, and prosperous future.
The RoleWe are hiring a Research Engineer to implement and scale experiments studying encoded reasoning and steganography in modern reasoning models.
This is a hands‑on, highly technical position focused on experiment design, model evaluation, and engineering platforms.
You will collaborate closely with research scientists to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.
ResponsibilitiesWe’re looking for a creative, rigorous engineer who loves to build in order to understand how safety issues intersect with reality. You will:
- Implement and reproduce prior work on encoded reasoning and steganography, extending it to current open‑weight reasoning models (e.g., Deep Seek‑R1 and V3, GPT‑OSS, QwQ).
- Develop and maintain modular experiment pipelines for evaluating steganography, encoded reasoning, and reward hacking.
- Build and test fine‑tuning workflows (SFT or RL‑based) to study emergent encoded reasoning and reward hacking behaviors.
- Collaborate with our research leads to design safety cases and control agenda monitoring mechanisms suitable for countering various types of unsafe chain of thought.
- Extend interpretability infrastructure
, including probing, feature ablation, and sparse autoencoder (SAE) analysis pipelines using frameworks like Transformer Lens. - Engineer datasets and evaluation suites for robust paraphrasing, steganography cover tasks, and monitoring robustness metrics.
- Collaborate with scientists to identify causal directions and larger‑scale mechanisms (via standard interp, DAS, MELBO, targeted LAT, and related methods) underlying encoded reasoning.
- Ensure reproducibility through clean code, experiment tracking, and open‑source releases.
- Contribute to research communication by preparing writeups, visualizations, and benchmark results for research vignettes and publications.
- Strong Python and Py Torch experience.
- Experience with LLM experimentation using frameworks such as Hugging Face Transformers, Transformer Lens, or equivalent.
- Building reproducible ML pipelines including data preprocessing, logging, visualization, and evaluation.
- RL fine‑tuning or training small‑to‑mid‑scale models through frameworks like TRL, verl, OpenRLHF or equivalents.
- Proficiency with experiment tracking tools such as Weights & Biases or MLflow, and Git.
- Active proficiency and/or intellectual curiosity working with AI‑assisted coding and research tools such as Claude Code, Codex, Cursor, Roo, Cline or equivalents.
- Familiarity with interpretability methods such as probing, activation patching, or feature attribution.
- Understanding of encoded reasoning, steganography, or information‑theoretic approaches to model communication; or some background in formal cryptography, information theory, or offensive cybersecurity.
- Experience with mechanistic interpretability such as feature visualization, direction ablation, SAEs, crosscoders, and circuit tracing.
- Background in information security, control, or formal…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).