Senior AI Development Platform Engineer – VP
Listed on 2026-01-02
-
Software Development
AI Engineer, Cloud Engineer - Software
Job Description
Morgan Stanley has been at the forefront of the AI journey with applications such as Genome that use AI to provide better, personalized advice to our clients. Generative AI provides further unique opportunities to provide new capabilities to the firm’s internal users as well as our clients. This role is for a senior platform engineer who will help build a firmwide AI Development Platform and drive adoption of AI capabilities throughout the enterprise.
The ideal candidate will have strong hands‑on experience of building software platforms on Kubernetes, API based development, REST framework, data engineering, and large‑scale API Gateway environments etc. Knowledge of AIML and hands‑on experience implementing solutions using Generative AI are also preferable. The candidate will have a strong passion for using AI to increase productivity as well as help generate new ideas for product & technical improvements.
Responsibilities
- Develop tooling and self‑service capabilities for deploying AI solutions for the firm. Collaborate with other developers to enhance the developer experience when building and deploying AI applications.
- Have a platform mindset and build common, reusable solutions to scale Generative AI use cases using pre‑trained models as well as fine‑tuned models.
- Collaborate with product manager, other tech leads, junior staff and other stakeholders to analyze requirements, translate them into technical specification and architecture documentation.
- Design scalable, robust, secure, and flexible architecture of components of the AI development platform.
- Leverage Kubernetes/Open Shift to develop modern containerized workloads.
- Leverage container registries like JFrog artifactory, container packaging/configuration management technologies like Helm & Kustomize, and Git Ops deployment methods to orchestrate, manage and deploy these workloads.
- Integrate with capabilities such as large‑scale vector stores for embeddings.
- Author best practices on the Generative AI ecosystem, when to use which tools, available models such as GPT, Llama, Hugging Face etc. and libraries such as Langchain.
- Analyze, investigate, and implement GenAI solutions focusing on Agentic Orchestration and Agent Builder frameworks.
- Contribute to major design decisions and product selection for building Generative AI solutions. Inclusive of app authentication, service communication, state externalization, container layering strategy and immutability.
- Ensure AI platform are reliable, scalable, and operational; (e.g. blueprints for upgrade/release strategies (E.g. Blue/Green); logging/monitoring/metrics; automation of system management tasks)
- Participate in all team’s Agile/ Scrum ceremonies.
- Strong hands‑on Application Development background in at least one prominent programming language, preferably Python Flask or FAST Api.
- Broad understanding of data engineering (SQL, No
SQL, Big Data, Kafka, Redis), data governance, data privacy and security. - Experience in development, management, and deployment of Kubernetes workloads, preferably on Open Shift.
- Experience with designing, developing, and managing RESTful services for large‑scale enterprise solutions.
- Hands‑on experience with multiprocessing, multithreading, asynchronous I/O, performance profiling in at least one prominent programming language, preferably python.
- Practitioner of unit testing, performance testing and BDD/acceptance testing.
- Understanding of OAuth 2.0 protocol for secure authorization.
- Proficiency with Open Telemetry tools including Grafana, Loki, Prometheus, and Cortex.
- Demonstrated experience in Dev Ops, understanding of CI/CD (Jenkins) and Git Ops.
- Ability to articulate technical concepts effectively to diverse audiences.
- Strong desire and ability to influence development teams and help them adopt AI.
- Demonstrated ability to work effectively and collaboratively in a global organization, across time zones, and across organizations.
- Understanding of deep learning, understanding of Machine Learning frameworks such as Tensor Flow or PyTorch.
- Understanding of Information Security, Secure coding practices.
- Experience in building cloud and container native…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).