Senior Lead Research Scientist,Agentic AI Job Toronto area,Ontario Canada,Software Development

Senior Lead Research Scientist, Agentic AI

Upwork Inc.’s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly skilled talent across the globe, and Lifted, which provides a purpose-built solution for enterprise organizations to source, contract, manage, and pay talent across the full spectrum of contingent work.

From Fortune 100 enterprises to entrepreneurs, businesses rely on Upwork Inc. to find and hire expert talent, leverage AI-powered work solutions, and drive business transformation. With access to professionals spanning more than 10,000 skills across AI & machine learning, software development, sales & marketing, customer support, finance & accounting, and more, the Upwork family of companies enables businesses of all sizes to scale, innovate, and transform their work forces for the age of AI and beyond.

Since its founding, Upwork Inc. has facilitated more than $30 billion in total transactions and services as it fulfills its purpose to create opportunity in every era of work. Learn more about the Upwork Marketplace at

We’re seeking a Senior Lead Research Scientist (Agentic AI) to push the frontier of autonomous, tool‑using AI and ensure that innovations make it into production. You’ll split your time between novel research (benchmarks, learning algorithms, publications, and thought leadership) and building the tools, datasets, and systems required to run rigorous experiments and ship results into our agentic platform. You will partner closely with ML engineers, product, platform, and safety teams to translate research into reliable, scalable capabilities for customers and developers on Upwork.

Responsibilities

50/50 Split between research and engineering/productionalization.
Advance agentic benchmarking. Define and maintain a rigorous evaluation suite for agents (task success, reliability, recovery, safety, latency, and cost). Establish protocols, datasets, and reproducible metrics aligned to best practices in agentic evaluation; continuously harden benchmarks against loopholes and overfitting.
Invent and publish. Lead novel studies on agent planning, tool use, reflection/memory, safety, and multi‑agent coordination. Publish at top venues (e.g., NeurIPS/ICML/ICLR/ACL) and present learnings internally and externally.
Explore RLEF for agents. Develop Reinforcement Learning from Execution Feedback (RLEF) approaches that ground agent behavior in environment/run‑time signals (e.g., execution traces, tool results, test outcomes), comparing to RLHF/RLAIF on agent tasks.
Continuous/online learning. Design safe, measurable loops for continual improvement (data selection, drift detection, reward model updates, policy refresh), with guardrails that protect quality and cost.
Build research tooling. Stand up agents‑at‑scale experiment infrastructure: simulators, sandboxes, and orchestration for long‑horizon tasks; evaluation harnesses; offline/online A/B; and dashboards for longitudinal tracking.
Train & align models. Implement high‑quality pipelines for SFT, DPO, RLHF/RLAIF/RLEF; manage data provenance, safety filters, and automated red‑teaming; integrate eval signals into CI/CD.
Ship to production. Collaborate with platform teams to graduate prototypes into reliable services (APIs/SDKs, auth, observability, rate limiting) and to integrate agents with developer protocols (e.g., MCP) and runtime services.

What it takes to catch our eye

PhD or equivalent research track record with peer‑reviewed publications in relevant venues; strong empirical methodology and scientific writing/presentation skills.
Demonstrated contributions to agentic evaluation/benchmarks or long‑horizon reasoning (e.g., designing tasks, metrics, robust protocols).
Hands‑on experience adapting LLMs for tool use and multi‑step plans; fluent in prompting, function/tool calling, and memory/critique patterns.
Practical mastery of alignment methods (SFT, DPO, RLHF, RLAIF, and RLEF) and reward‑modeling; you know when to prefer each…


Increase/decrease your Search Radius (miles)



Job Posting Language