Senior Applied Scientist, NLP/GenAI
Listed on 2026-02-12
-
IT/Tech
Data Scientist, AI Engineer, Machine Learning/ ML Engineer
Senior Applied Scientist, NLP/GenAI – Thomson Reuters
Document understanding is a foundational intelligence layer that powers every major capability across our legal AI platform—from search and information extraction to agentic reasoning in products such as Westlaw, Practical Law, and CoCounsel. You will build state‑of‑the‑art semantic chunking, document enrichment, and knowledge‑graph construction systems that serve as the cognitive foundation for multiple product teams across authoritative legal, tax, and accounting content and diverse customer data.
This is a rare opportunity to solve publishing‑quality research problems with immediate production impact—your innovations will directly shape how millions of legal professionals research, analyze, and reason over complex legal documents while advancing the capabilities that enable the next generation of intelligent legal AI agents.
About the Role
- Innovate & Deliver
:
Design, build, test, and deploy end‑to‑end AI solutions for complex document‑understanding tasks in the legal domain. Develop advanced models for semantic chunking of lengthy, non‑uniformly structured legal documents with adjustable granularity. Build document enrichment systems that classify documents according to legal and customer‑defined taxonomies and extract rich metadata. Create LLM‑based knowledge‑graph construction pipelines that extract and link heterogeneous legal knowledge—including citations, entities, and legal concepts—across diverse content.
Develop scalable synthetic‑data generation systems to support model training and generate hallucination‑free answers. Work with engineering to ensure well‑managed software delivery and reliability at scale. - Evaluate & Optimize
:
Develop comprehensive data and evaluation strategies for component‑level and end‑to‑end quality, leveraging expert human annotation and synthetic data. Apply robust training and evaluation methodologies that balance model performance with latency for SLM‑based solutions. Apply knowledge‑distillation techniques to compress large models for production deployment. - Drive Technical Decisions
:
Independently determine appropriate architectures for challenging document‑understanding problems—including semantic chunking strategies that preserve document structure, classification approaches that generalize across taxonomies, LLM‑based extraction methods that handle citation and contextual errors, and multi‑document reasoning architectures for synthetic multi‑hop queries. Balance accuracy, efficiency, and scalability while addressing real‑world challenges. - Align & Communicate
:
Partner closely with Engineering and Product teams to translate complex legal challenges into scalable, production‑ready solutions. Engage stakeholders across multiple product lines to understand use‑case requirements and shape objectives that align with business needs. - Advance the Field
:
Maintain scientific and technical expertise, demonstrated through product deliverables, published research at top venues (e.g., ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD), and intellectual property.
About You
- Ph.D. in Computer Science, AI, NLP, or related field, or a Master’s with equivalent research/industry experience.
- 5+ years of hands‑on experience building and deploying document‑understanding systems, information‑extraction pipelines, or knowledge‑graph construction using deep learning, LLMs, and NLP methods.
- Proven ability to translate complex document‑understanding problems into innovative AI applications that balance accuracy and efficiency.
- Professional experience leading through others in an applied research setting.
- Strong programming skills (e.g., Python) and experience with modern deep‑learning frameworks (e.g., PyTorch, Hugging Face Transformers, Deep Speed).
- Publications at relevant venues such as ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD.
Technical Qualifications
- Deep understanding of document‑understanding fundamentals: layout analysis, semantic chunking beyond fixed‑size or paragraph methods, classification handling hierarchical taxonomies, imbalanced multi‑label classification, and domain‑specific schema adaptation.
- Expertise in knowledge extraction and graph…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).