More jobs:
Job Description & How to Apply Below
About the job
Attention please!
• Only short NP-less than 30 days accepted, kindly pay attention to this to save your efforts, thank you for understanding
Responsibilities:
Automate deployment and management processes for machine learning platforms using tools such as Ansible and Python.
Deploy, monitor, and patch ML platform components, including Cloudera Data Science Workbench (CDSW), Docker containers, and Kubernetes clusters.
Ensure high availability and reliability of ML infrastructure through proactive maintenance and regular updates.
Develop and maintain comprehensive documentation for platform configurations, processes, and procedures.
Troubleshoot and resolve platform issues, ensuring minimal downtime and optimal performance.
Implement best practices for security, scalability, and automation within the ML platform ecosystem.
Mandatory Skills
Description:
We are seeking a skilled ML Platform Engineer for automating, deploying, patching, and maintaining our machine learning platform infrastructure.
You need to have hands-on experience with Cloudera Data Science Workbench (CDSW), Cloudera Data Platform (CDP), Docker, Kubernetes, Python, Ansible, Git Lab, and MLOps best practices.
- Hands-on experience with CDSW (Cloudera Data Science Workbench) or similar data science platform or similar ML/AI platforms.
- Proficiency in containerization and orchestration using Docker and Kubernetes (AKS preferred)
- Solid scripting and automation skills in Python and Ansible.
- Experience with Git Lab for source control and CI/CD automation.
- Understanding of MLOps principles and best practices (deployment, monitoring, lifecycle management of ML workloads).
- Familiar with patching, updating, and maintaining platform infrastructure.
- Profound Unix knowledge
- Excellent problem-solving skills and a collaborative approach to team projects.
Nice-to-Have Skills
Description:
- Previous banking domain Experience.
- Familiarity with Cloudera CDP ecosystem (beyond CDSW).
- Knowledge of monitoring & observability tools (Prometheus, Grafana, ELK).
- Exposure to Airflow, MLflow, or Kubeflow for workflow and ML lifecycle orchestration.
- Cloud platform experience with Azure (AKS, networking, storage, monitoring).
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×