ML Operations Engineer - Associate Vice President
Listed on 2026-02-16
-
IT/Tech
Machine Learning/ ML Engineer, AI Engineer, Data Engineer, Cloud Computing
We are seeking an experienced MLOps Engineer to join our Dev Ops and Infrastructure Engineering team. This role is crucial for operationalizing, scaling, and maintaining our Artificial Intelligence (AI) and Machine Learning (ML) applications. The successful candidate will leverage their expertise to ensure seamless, scalable, and reliable deployment and management of AI/ML models, working closely with data scientists and ML engineers.
This position requires strong proficiency in Python, hands-on experience with Ray Tune for hyperparameter optimization, and MLflow for experiment tracking and model lifecycle management.
- ML Pipeline Development & Automation: Design, build, and maintain robust and scalable end-to-end ML pipelines for data ingestion, preprocessing, model training, validation, and deployment.
- CI/CD for ML: Implement and manage Continuous Integration/Continuous Delivery (CI/CD) pipelines specifically tailored for machine learning workflows, ensuring automated testing, versioning, and deployment of ML artifacts.
- Experiment Tracking & Model Management: Utilize MLflow extensively for experiment tracking, reproducible runs, managing model versions, and maintaining a centralized model registry.
- Hyperparameter Optimization: Leverage Ray Tune for efficient and distributed hyperparameter optimization to enhance model performance and accelerate experimentation.
- Containerization & Orchestration: Package ML models and their dependencies using Docker and deploy/manage them effectively on Kubernetes clusters.
- Data Platform Integration: Integrate with and optimize existing data platforms, including Apache Iceberg, Apache Spark, and FLINK, to ensure efficient data processing and feature engineering for ML models.
- Data Storage & Streaming: Work with Postgre
SQL, Oracle, and Mongo
DB for diverse data storage needs, and utilize Kafka for real-time data streaming to support various ML applications. - Monitoring & Observability: Implement comprehensive monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana) for ML models in production, tracking model performance, data drift, and infrastructure health to ensure reliability and facilitate automated retraining or rollback.
- Scripting & Automation: Develop automation scripts and tools using Python and Bash/Go to streamline MLOps processes and integrate various systems.
- Collaboration: Act as a vital link between data scientists, ML engineers, and infrastructure teams, facilitating clear communication and ensuring that ML solutions are production-ready.
- Experience: 3-5 years of hands‑on experience in an MLOps, Dev Ops, or Machine Learning Engineering role, with a proven track record of deploying and managing ML models in production environments.
- Programming: Expert‑level proficiency in Python for ML development, scripting, and automation.
- MLOps Tooling: Demonstrated hands‑on experience with Ray Tune for hyperparameter optimization and Air Flow or MLflow for experiment tracking and model management.
- Containerization & Orchestration: Strong experience with Docker and Kubernetes (including Helm).
- CI/CD: Experience implementing CI/CD practices for software and/or ML pipelines.
- Data Technologies: Familiarity with or experience with Apache Spark, Apache Iceberg, FLINK, and Kafka.
- Databases:
Experience with Postgre
SQL, Oracle, and Mongo
DB. - Workflow Orchestration: Experience with Apache Airflow.
- Infrastructure as Code: Experience with Hashi Corp (Terraform).
- Operating Systems: Proficiency in Linux/Unix environments.
Skills:
- Experience with cloud platforms (AWS, Azure, GCP) and managing cloud‑native ML infrastructure.
- Knowledge of deep learning frameworks such as Tensor Flow or PyTorch.
- Experience with generative AI technologies (e.g., LLMs, prompt engineering, RAG pipelines).
- Understanding of distributed computing and big data processing techniques.
Technology
Job Family:Applications Development
Time Type:Full time
PrimaryLocation:
Irving Texas United States
Primary Location Full Time Salary Range:$ - $
In addition to salary, Citi’s offerings may also include, for eligible employees, discretionary and formulaic…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).