Senior Data Engineer - Data Ingestion and Enrichment team Job London area,Greater London England UK,IT/Tech

Location: Greater London

Meet the team!

At Preply, the Data ingestion and enrichment team provides a single, trusted, and scalable data foundation. The team ensures that all analytics, machine learning, and product features are built on unified, governed, and production‑grade data assets in Preply’s Lake House, including the extraction, normalization, and generation of structured data from Preply’s unstructured assets, forming a durable data moat for AI‑driven products.

As a Senior Data Engineer in the Data Ingestion and Enrichment team
, you will design and own the data layer that powers both Preply’s analytics, machine learning, and product. You will work closely with ML Platform, Applied/Data Scientists, Analytics Engineering, and Product squads to ensure that features, datasets, and pipelines are production‑ready, observable, and reusable across the company. This role combines hands‑on engineering with technical leadership.

What you’ll be doing :

Build trusted ingestion & enrichment foundations (Data Lake and Data as a Product):

Design, build, and own Preply’s data lake. Ensure every dataset has clear ownership, purpose, schemas, and quality expectations from first ingestion through downstream consumption by analytics, product, and ML teams. Treat trust, correctness, and predictability as first‑class features of the platform.

Own end‑to‑end ingestion pipelines (batch & streaming):

Develop and operate scalable, reliable batch and streaming ingestion pipelines that support both real‑time and analytical use cases. Design clear raw → standardized → consumption layers with explicit responsibilities, lineage, and retention strategies. Balance performance, cost, and reliability as the platform scales.

Data quality, contracts & early validation:

Define and implement data contracts between producers and consumers, covering schema, freshness, volume, and quality guarantees. Embed validation, anomaly detection, and quality checks early in the ingestion lifecycle to catch issues before they propagate. Standardize how quality metrics are measured, monitored, and surfaced across the platform.

Enrichment, modeling & lifecycle management:

Build enrichment logic that joins, standardizes, and contextualizes data across domains using shared definitions and reusable patterns. Support historical tracking, point‑in‑time correctness, and dataset versioning so downstream users can confidently analyze changes and impacts over time.

Observability, reliability & operational excellence:

Instrument ingestion pipelines with strong observability: freshness, latency, data quality, and cost metrics. Contribute to SLOs, alerting, and incident response playbooks so data failures are visible, diagnosable, and recoverable. Help move the platform from reactive firefighting to proactive reliability management.

Governance & compliance by design:

Apply consistent access control, classification, and privacy protections at ingestion time. Ensure sensitive data is properly masked, minimized, or anonymized by default, and that all data flows are auditable and traceable. Make governance invisible to users but deeply embedded in platform workflows.

Enable self‑service & standardization:

Contribute to standardized ingestion templates, shared libraries, and platform tooling that enable teams to onboard new data sources independently within clear guardrails. Improve discoverability, documentation, and metadata so datasets are easy to find, understand, and trust without relying on tribal knowledge.

Cross‑team collaboration & ownership:

Work closely with Product, Backend, Analytics, and ML partners to align on ingestion requirements, trade‑offs, and priorities. Promote shared ownership of data quality and platform standards, and help foster a culture where teams move fast together under common data contracts and principles.

What you need to succeed:

Exposure to and experience building architectural patterns of a large, high‑scale application (e.g., well‑designed APIs, high‑volume data pipelines, efficient algorithms).
Solid experience working in platform or data engineering teams (or equivalent impact) with evidence of leading multi‑stakeholder deliveries.

Familiarity with cloud…