Job Description – Data Quality Engineer (Databricks) – x 4 Positions
Location: Abu Dhabi, UAE – Onsite (Open to Relocate)
Duration: 6 months (Extendable to One Year)
Experience: 5 to 7 Years
Project start date: 1st July – Immediate joiners will be preferred
Role OverviewThe Data Quality Engineer will be responsible for designing, implementing, and operating ADC's enterprise data quality framework within the Databricks platform. The role will deliver automated profiling, quality rule execution, cleansing, monitoring, remediation support, and quality reporting capabilities across 170 datasets and 1,346 prioritised Critical Data Elements (CDEs). Working closely with Data Modellers, Data Catalogue Specialists, business data owners, and platform engineers, the Data Quality Engineer will establish scalable and reusable quality controls that improve trust, accuracy, completeness, consistency, timeliness, validity, and uniqueness across ADC's data estate.
Key Responsibilities Databricks Platform Configuration and Administration- Configure and manage the Databricks environment supporting enterprise data quality operations.
- Establish and maintain compute clusters, PySpark notebook frameworks, Delta Lake storage structures, and Unity Catalog integration.
- Optimise platform performance for large‑scale profiling and rule execution across all in-scope datasets and CDEs.
- Implement development, testing, and production deployment standards for data quality assets.
- Design and develop AI‑assisted profiling notebooks using PySpark.
- Perform baseline data quality assessments across the six quality dimensions: completeness, accuracy, consistency, validity, timeliness, uniqueness.
- Capture and analyse null value rates, duplicate records, invalid values, format violations, outliers, and schema drift.
- Produce quality profiling outputs for all prioritised CDEs and datasets.
- Design and implement a reusable Data Quality Rule Factory.
- Build parameterised PySpark‑based rule templates capable of supporting large‑scale rule deployment.
- Enable automated generation and management of approximately 6,730 data quality rules without manual rule‑by‑rule development.
- Ensure rules are reusable, configurable, and maintainable across multiple datasets and domains.
- Deploy quality rules as reusable Databricks Jobs integrated into Delta Lake processing pipelines.
- Embed quality controls within Bronze, Silver, and Gold processing stages.
- Implement automated quality gates preventing data progression where defined thresholds are not met.
- Maintain rule traceability and execution history for audit and governance purposes.
- Develop automated remediation and cleansing pipelines using PySpark.
- Implement standardisation routines, data enrichment processes, deduplication logic, and schema harmonisation controls.
- Deploy machine learning models managed through MLflow for anomaly detection, exact duplicate detection, and fuzzy matching/duplicate identification.
- Ensure all AI and ML recommendations are explainable, auditable, and routed through human‑in‑the‑loop validation processes where required.
- Design and manage exception handling processes for failed quality records.
- Implement quarantine Delta Lake tables serving as the Failed Record Register.
- Capture and maintain failure reason, associated CDE, rule reference, processing timestamp, and resolution status.
- Develop reprocessing workflows to support correction and controlled re‑ingestion of remediated records.
- Develop Delta Lake metric aggregation structures supporting enterprise quality reporting.
- Calculate and publish Data Quality Index (DQI) scores, dimension‑level quality metrics, rule pass/fail rates, dataset compliance scores, and SLA adherence indicators.
- Provide curated outputs to support Power BI quality dashboards and executive reporting.
- Configure automated quality monitoring and alerting mechanisms.
- Implement threshold‑based notifications using…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).