AI Sr. Data Engineer Job Edinburgh area,City of Edinburgh Scotland UK,IT/Tech

Location: City of Edinburgh

The Lenovo AI Technology Center (LATC)—Lenovo’s global AI Center of Excellence—is driving our transformation into an AI‑first organization. We are assembling a world‑class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world’s leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (Think Pad, Yoga), PCs, workstations, servers, and services/solutions.

This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting‑edge foundation models and to enable flexible, hybrid‑cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast‑moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources.

Responsibilities

Data Creation & Annotation:
Design, build, and implement processes for creating task‑specific training datasets. This may include data labeling, annotation, and data augmentation techniques.
Data Pipeline Development:
Leverage tools and technologies to accelerate dataset creation and improvement. This includes scripting, automation, and potentially working with data labeling platforms.
Data Quality & Evaluation:
Perform thorough data analysis to assess data quality, identify anomalies, and ensure data integrity. Utilize machine learning tools and techniques to evaluate dataset performance and identify areas for improvement.
Big Data Technologies:
Utilize database systems (SQL and No

SQL) and big data tools (e.g., Spark, Hadoop, cloud‑based data warehouses like Snowflake/Redshift/Big Query) to process, transform, and store large datasets.
Data Governance & Lineage:
Implement and maintain data governance best practices, including data source tracking, data lineage documentation, and license management. Ensure compliance with data privacy regulations.
Collaboration with Model Developers:
Work closely with machine learning engineers and data scientists to understand their data requirements, provide clean and well‑documented datasets, and iterate on data solutions based on model performance feedback.
Documentation:
Create and maintain clear and concise documentation for data pipelines, data quality checks, and data governance procedures.
Stay Current:
Keep up‑to‑date with the latest advancements in data engineering, machine learning, and data governance.

Qualifications

Education:

Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, Statistics, Mathematics, or a related field.
Experience:

15+ years of experience in a data engineering or data science role.
Programming

Skills:

Mastery in Python and SQL. Experience with other languages (e.g., Java, Scala) is a plus.
Database

Skills:

Strong experience with relational databases (e.g., Postgre

SQL, MySQL) and No

SQL databases (e.g., Mongo

DB, Cassandra).
Big Data Tools:
Experience with big data technologies such as Spark, Hadoop or cloud‑based data warehousing solutions (Snowflake, Redshift, Big Query).
Data Manipulation:
Proficiency in data manipulation and cleaning techniques using tools like Pandas, Num Py, and other data processing libraries.
ML Fundamentals:
Solid understanding of machine learning concepts and techniques, including data preprocessing, feature engineering, and model evaluation.
Data Governance:
Understanding of data governance principles and practices, including data lineage, data quality, and data security.
Communication

Skills:

Excellent written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non‑technical audiences.
Problem Solving:
Strong analytical and problem‑solving skills.

Bonus Points

Experience with data labeling platforms (e.g., Labelbox, Scale AI, Amazon Sage Maker Ground Truth).
Experience with MLOps practices and tools (e.g., Kubeflow, MLflow).
Experience with cloud platforms (e.g., AWS, Azure, GCP).
Experience with data…


Increase/decrease your Search Radius (miles)



Job Posting Language