Data Engineer Data Bricks Pyspark and SQL
Listed on 2026-06-02
-
IT/Tech
Data Engineer
Job Summary
This role builds and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction, transformation, and loading of data across batch and real‑time workloads. It involves implementing and managing the Medallion Architecture (Bronze, Silver, Gold) using Azure Data Factory, Databricks‑PySpark, Azure SQL Database and the Databricks Unity catalogue. The role requires ensuring SLA‑adherent data quality standards. Success is measured by pipeline reliability, data freshness SLA compliance, and the quality of Gold‑layer datasets powering Power BI executive dashboards.
The work supports organizational decision‑making by delivering trusted, well‑governed data to business executives and analytics consumers.
- Build and optimize big data pipelines using Azure Data Factory, PySpark, and SQL across structured and semi‑structured data sets.
- Implement and maintain the Medallion Architecture (Bronze/Silver/Gold) and Delta Lake (ACID transactions, incremental loading, schema evolution, partitioning).
- Perform root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities.
- Adapt pipelines for batch and real‑time workloads, including real‑time/streaming using Azure Event Hubs or Structured Streaming in PySpark.
- Develop robust ADF pipelines with activities such as For Each, Lookup, Copy, and Data Flows; configure incremental loading via watermark or CDC, error handling, retry logic, and dead‑letter patterns.
- Implement SLA‑based data quality checks (freshness, completeness, row count), monitor via Azure Monitor and ADF diagnostic logs, and define data quality agreements with business stakeholders.
- Design and maintain data architecture:
Medallion Architecture, dimensional modeling (Star Schema, SCD Types 1/2/3), and trade‑offs between lakehouse, data warehouse, and data lake patterns. - Work with Git‑based workflows, ADF Git integration, CI/CD promotion across Dev/Test/Prod using Azure Dev Ops or Git Hub Actions.
- Collaborate with BI teams to ensure Gold‑layer data feeds Power BI dashboards (Direct Query vs. Import mode, refresh patterns, semantic model collaboration).
- Manage multiple concurrent pipeline projects, prioritize by business impact, and communicate status to technical and non‑technical stakeholders.
- 7 years experience with Databricks + PySpark.
- 7 years experience with SQL, Spark, and Big Data.
- 5 years experience in the telecom domain.
- Strong proficiency in Python and PySpark for data transformation and large‑scale distributed processing.
- Expertise in SQL, including window functions, CTEs, and query optimization across relational and lakehouse engines.
- Hands‑on experience with Azure Data Factory, Azure Blob Storage, ADLS Gen2, Azure SQL Database, Azure Key Vault, Azure Monitor, Event Hubs, Microsoft Fabric Lakehouse, Azure Active Directory/Entra .
- Experience with Delta Lake, ACID transactions, incremental loading, schema evolution, and partitioning strategies.
- Experience with Git‑based workflows, CI/CD for data pipelines, and Dev Ops practices within Azure or Git Hub environments.
- Solid understanding of Medallion Architecture, dimensional modeling, and trade‑offs among lakehouse, data warehouse, and data lake patterns.
- Experience with Power BI analytics and reporting layer awareness.
- Experience with Microsoft Fabric (Lakehouse, Notebooks, One Lake, Fabric Pipelines).
- Experience with real‑time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark.
- Experience delivering data platforms for executive‑level reporting via Power BI semantic models.
Onsite requirement: 3 days per week.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).