About the Opportunity
A leading financial institution is seeking a highly experienced Lead AI Engineer to join its advanced technology division. This is a high-impact, leadership-track role at the intersection of AI engineering, Site Reliability, and enterprise-grade software architecture. The successful candidate will design, build, and operationalize the next generation of agentic AI systems within a regulated banking environment — driving intelligent automation while maintaining the rigorous security, compliance, and availability standards demanded by the financial services industry.
You will architect multi-agent LLM systems, implement Model Context Protocol (MCP) servers, build production-grade RAG pipelines, and lead AI observability practices using the ELK stack. This role requires deep technical expertise combined with the leadership acumen to mentor engineers and influence cross-functional technical decisions.
Pillar 1 — AI Architecture & Agentic Systems- Design and implement sophisticated LLM-powered agentic workflows and multi-agent architectures capable of autonomous reasoning, planning, and tool execution within secure financial system boundaries.
- Architect and deploy scalable Model Context Protocol (MCP) servers to enable standardized, secure, and rich context management between AI models, internal banking APIs, and external data sources.
- Develop production-grade Retrieval-Augmented Generation (RAG) and GraphRAG pipelines that ground AI agents in accurate, real-time enterprise financial data with full auditability.
- Leverage expertise in Meta AI (Llama ecosystem), Google AI (Gemini, Vertex AI), and Microsoft Copilot to build and integrate cutting-edge AI features while adhering to financial data handling policies.
- Implement prompt versioning, model drift detection, and automated evaluation pipelines to maintain AI system quality and regulatory compliance over time.
- Lead end-to-end development of robust, scalable AI applications using Node.js (Type Script) and Python (FastAPI/Django) — both languages are required.
- Champion AI-assisted developer workflows ("Vibe Coding") using advanced tools such as Cursor and Git Hub Copilot to improve team productivity and code quality.
- Design and implement secure, high-performance RESTful and GraphQL APIs to serve LLM inferences and agentic actions to frontend and downstream systems.
- Develop and maintain Bash and Python automation scripts for infrastructure management, deployment orchestration, and operational efficiency.
- Mentor junior and mid-level engineers in AI-native development practices and modern architectural patterns.
- Implement comprehensive observability stacks using the ELK Stack (Elasticsearch, Logstash, Kibana) specifically tuned for LLM performance metrics: latency, token usage, hallucination rates, and model drift indicators.
- Apply SRE best practices to AI workloads — ensuring high availability, fault tolerance, incident response playbooks, and SLO/SLA management for LLM inference services.
- Build and maintain CI/CD pipelines tailored for machine learning models, including prompt versioning, model evaluation gates, shadow deployments, and automated rollback.
- Design alerting, on-call runbooks, and escalation paths for AI system incidents within a regulated financial services environment.
- AI & Machine Learning - Deep understanding of LLM architectures, prompt engineering, fine-tuning techniques (LoRA/qLoRA), and embedding models. Proven experience building and operating production-grade LLM applications.
- Agentic Frameworks - Hands-on experience designing autonomous agents and implementing Model Context Protocol (MCP) servers for standardized tool and context management.
- RAG & Vector Databases - Strong experience building RAG and GraphRAG pipelines. Proficiency with vector databases (Pinecone, Milvus, or Weaviate) and embedding model selection strategies.
- Observability & SRE - Extensive hands-on experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for distributed system logging, monitoring, and AI-specific metrics tracking.
- Cloud & Infrastructure - Proven experience with cloud-native architectures. Azure and AKS (Azure Kubernetes Service) experience strongly preferred for this engagement.
- Enterprise AI Tools - Demonstrated expertise with Microsoft Copilot (Copilot Studio extensibility, custom connectors), Meta AI open-source models, and Google AI infrastructure (Gemini/Vertex AI).
- Leadership - 8+ years of progressive software engineering experience. Minimum 3 years in a technical leadership or architectural role with a focus on AI/ML systems.
Given the regulated nature of this environment, candidates must demonstrate awareness of and experience with the following:
- Working knowledge of SOC 2 Type II compliance principles and their impact on AI system design and data handling.
- Understanding of financial data…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: