AI Infrastructure Engineer/MLOps Engineer
Listed on 2025-12-21
-
IT/Tech
Systems Engineer, AI Engineer, Cloud Computing
Location: City of Edinburgh
AI Infrastructure Engineer / MLOps Engineer
Join Lenovo’s AI Technology Center (LATC) – a global AI Center of Excellence – to help shape AI at a truly global scale. We’re building the next wave of AI core technologies and platforms, and we need a highly skilled AI Infrastructure Engineer / AI Operations Engineer to design, build, and maintain the infrastructure and tools necessary for efficient AI model development, deployment, and operation.
Responsibilities:
- AI Infrastructure Design and Implementation:
Design, build, and maintain scalable and efficient AI infrastructure, including compute resources, storage solutions, and networking configurations. - AI Model Deployment and Management:
Develop and implement processes for deploying, monitoring, and managing AI models in production environments. - Automation and Tooling:
Create and maintain automation scripts and tools for AI model training, testing, evaluation, and deployment in a continuous integration / continuous delivery (CI/CD) pipeline. - Collaboration and Support:
Work closely with data scientists, engineers, and other stakeholders to ensure smooth operation of AI systems and provide support as needed. - Performance Optimization:
Continuously monitor and optimize AI infrastructure and models for performance, scalability, utilization, and reliability. - Security and Compliance:
Ensure AI infrastructure and models comply with relevant security and regulatory requirements.
Qualifications:
- Bachelor’s or Master’s degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
- 8+ years of experience in software engineering, Dev Ops, or a related field.
- Strong background in computer systems, distributed systems, and cloud computing.
- Proficient in Linux system administration, including package management, user/group management, file system navigation, shell scripting (bash), and system configuration (systemd, networking).
- Proficiency in programming languages such as Python, Java, or C++.
- Experience with AI-specific infrastructure and tools (e.g., NVIDIA GPUs and CUDA).
- Experience with setting up multi-node distributed GPU clusters, leveraging Slurm, Kubernetes or related software stacks.
- Experience with managing high-performance computing (HPC) clusters, including job scheduling, resource allocation, and cluster maintenance.
- Familiarity configuring job scheduling tools (e.g., Slurm).
- Experience with AI infrastructure, model deployment, and management.
- Excellent problem‑solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work in a fast‑paced, dynamic environment.
Bonus Points:
- Familiarity with AI and machine learning frameworks (PyTorch).
- Familiarity with cloud platforms (AWS, GCP, Azure).
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Experience with monitoring and logging tools (Prometheus, Grafana).
What we offer:
- Opportunities for career advancement and personal development.
- Access to a diverse range of training programs.
- Performance‑based rewards that celebrate your achievements.
- Flexibility with a hybrid work model (3:2) that blends home and office life.
- Electric car salary sacrifice scheme.
- Life insurance.
Location: Edinburgh, Scotland – candidates must be based there, as the role requires working from the office at least three days per week (3:2 hybrid policy).
Seniority level: Mid‑Senior level
Employment type: Full‑time
Job function: Information Technology
Industry: IT Services and IT Consulting
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: