Data Scientist
Listed on 2025-12-22
-
IT/Tech
AI Engineer, Data Scientist, Machine Learning/ ML Engineer
New York, New York;
Salt Lake City, Utah
Please note:
Our offices will be closed for our annual winter break from December 22, 2025, to January 2, 2026. Our response to your application will be delayed. The Impact You’ll Make
As a member of Recursion's AI-driven drug discovery initiatives, you will be at the forefront of reimagining how biological knowledge is generated, stored, accessed, and reasoned upon by LLMs. You will play a key role in developing the biological reasoning infrastructure, connecting large-scale data and codebases with dynamic, agent-driven AI systems. You will be responsible for defining the architecture that grounds our agents in biological truth.
This involves integrating biomedical resources to enable AI systems to reason effectively and selecting the most appropriate data retrieval strategies to support those insights. This is a highly collaborative role: you will partner with machine learning engineers, biologists, chemists, and platform teams to build the connective tissue that allows our AI agents to reason like a scientist. The ideal candidate possesses deep expertise in both core bioinformatics/cheminformatics libraries and modern GenAI frameworks (including RAG and MCP), a strong architectural vision, and the ability to translate high‑potential prototypes into scalable production workflows.
this role, you will:
- Architect and maintain robust infrastructure to keep critical internal and external biological resources (e.g., ChEMBL, Ensembl, Reactome, proprietary assays) up‑to‑date and accessible to reasoning agents.
- Design sophisticated context retrieval strategies, choosing the most effective approach for each biological use case, whether working with structured, entity‑focused data, unstructured RAG, or graph‑based representations.
- Integrate established bioinformatics/cheminformatics libraries into a GenAI ecosystem, creating interfaces (such as via MCP) that allow agents to autonomously query and manipulate biological data.
- Pilot methods for tool use by LLMs, enabling the system to perform complex tasks like pathway analysis on the fly rather than relying solely on memorized weights.
- Develop scalable, production‑grade systems that serve as the backbone for Recursion’s automated scientific reasoning capabilities.
- Collaborate cross‑functionally with Recursion’s core biology, chemistry, data science and engineering teams to ensure our biological data and the reasoning engines are accurately reflecting the complexity of disease biology and drug discovery.
- Present technical trade‑offs (e.g., graph vs. vector) to leadership and stakeholders in a clear, compelling way that aligns technical reality with product vision.
You’ll join a bold, agile team of scientists and engineers dedicated to building comprehensive biological maps by integrating Recursion’s in‑house datasets, patient data, and external knowledge layers to enable sophisticated agent‑based reasoning. Within this cross‑functional team, you will design and maintain the biological context and data structures that allow agents to reason accurately and efficiently. You’ll collaborate closely with wet‑lab biologists and core platform engineers to develop systems that are not only technically robust but also scientifically rigorous.
The ideal candidate is curious about emerging AI technologies, passionate about making biological data both machine‑readable and machine‑understandable, and brings a strong foundation in systems biology, biomedical data analysis, and agentic AI systems.
- PhD in a relevant field (Bioinformatics, Cheminformatics, Computational Biology, Computer Science, Systems Biology) with 5+ years of industry experience, or MS in a relevant field with 7+ years of experience, focusing on biological data representation and retrieval.
- Proficiency in utilizing major public biological databases (NCBI, Ensembl, STRING, GO) and using standard bioinformatics/cheminformatics toolkits (e.g., RDKit, samtools, Biopython).
- Strong skills in designing and maintaining automated data pipelines that support continuous ingestion,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).