Lead ML Ops Engineer, GFT
Job in
Toronto, Ontario, M5A, Canada
Listing for:
0000050007 Royal Bank of Canada
Full Time
position
Listed on 2026-02-17
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist, Data Engineer
Job Description & How to Apply Below
Job Description
What is the opportunity?Join GFT as a Lead ML Ops Engineer to build and manage CI/CD for ML and agentic AI on Open Shift using Git Hub Actions and Airflow. You will partner with machine learning engineers and data engineers to deliver business value with agile practices, maintaining data governance and security. Design scalable pipelines, support data ingestion and feature engineering, and integrate models into APIs and services.
Participate in architecture reviews and approvals, set operational standards, and maintain documentation. Own monitoring and incident triage/resolution, optimizing reliability, performance, and cost while enabling teams to safely ship GenAI solutions.
What will you do?Build and operate CI/CD for ML and GenAI on Open Shift using Git Hub Actions and Airflow.Deploy, scale, and manage LLMs and agentic services on OCP and cloud (AWS/Azure).Architect and deliver solutions/PoCs for cutting-edge GenAI (RAG, tool-use, agent orchestration).Own model lifecycle ops: versioning, registries, feature stores, vector DBs, and GPU environments.Build and maintain Model Context Protocol (MCP) integrations (servers/clients and tool adapters) to let LLM agents securely invoke internal APIs, data sources, and actions on Open Shift.Define release/rollback procedures; ensure reliable maintenance of deployed models.Collaborate with data scientists, ML engineers, and data engineers to integrate models/APIs into microservices and data pipelines.Drive platform roadmap, standards, and documentation; enable teams via tooling, templates, and best practices.What do you need to succeed?Must HaveHands-on MLOps/LLMOps with containers/Kubernetes/Open Shift; CI/CD via Git Hub Actions; orchestration with Airflow (good-to-have).Production experience with LLMs and agentic AI: RAG, vector search, prompt management, and serving patterns.Hands-on with agentic AI and Understanding of MCP: design agent tool-use flows, deploying MCP servers/tools.Expert in Python and SQL. Cloud experience (AWS or Azure); designing scalable and secure architectures along with the data architecture team and ML Engineers.3+ years hands on experienceProven production experience running ML/GenAI with monitoring and performance tuning.Excellent communication and cross-team leadership; BS in CS/Engineering or related (MS/PhD preferred).Being able to articulate technical concepts to leadership or key stakeholdersFoster and drive key innovations within the extended team with emerging technology.Nice to HaveExperience working with Snowflake and Sage Maker; model registries/feature stores (MLflow, Feast).Guardrails/safety, evaluation methods, and cost optimization for GPU/LLM workloads.Understanding of MCP (capability schemas, resources/tools, session management) and telemetry for tool calls (latency/success/cost)LLM frameworks and evaluation/guardrails.What’s in it for you? We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicableLeaders who support your development through coaching and managing opportunitiesAbility to make a difference and lasting impactWork in a dynamic, collaborative, progressive, and high-performing teamA world-class training program in financial servicesOpportunities to do challenging workOpportunities to take on progressively greater accountabilitiesOpportunities to building close relationships with clientsAccess to a variety of job opportunities across business and geographies.#LI-Post#LI-PK
Job SkillsBig Data Management, Data Mining, Data Science, Deep Learning, Machine Learning (ML), Predictive Analytics, Programming Languages
Additional Job DetailsAddress:
RBC CENTRE, 155 WELLINGTON ST W:
TORONTO
City:
Toronto
Country:
Canada
Work hours/week:
37.5
Employment Type:
Full time
Platform:
TECHNOLOGY AND OPERATIONS
Job Type:
Regula…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here: