Data Engineer; PySpark UAE Job Dubai area,UAE/Dubai,IT/Tech

Position: Data Engineer (PySpark) | Innovations Global | UAE

Job Details

Job Title: Data Engineer (PySpark) | Innovations Global | Dubai, UAE
Recruiting Company: Innovations Global
Job Location: Dubai, United Arab Emirates (Onsite)
Job Type: Full-Time
Application Method: sp.ha
Notice Period: Immediate to 30 Days
Experience

Required:

5+ Years
Industry Requirement: Banking Domain (Mandatory)

Position Summary

The Data Engineer will design, build, and optimize large-scale data pipelines using PySpark and Cloudera Data Platform within a fast-paced banking environment. This role is key to enabling reliable data processing, analytics, and ETL workflows that power mission‑critical financial applications.

Job Description

As a Data Engineer, you will work with massive datasets on Cloudera Data Platform, leveraging advanced PySpark techniques to deliver efficient and scalable data transformations. You will collaborate with data architects, business analysts, and engineering teams to build robust ETL pipelines, optimize cluster and workload performance, and support key banking initiatives. This role requires strong experience with CDP components, big data frameworks, and orchestration tools, alongside excellent Linux scripting skills and a deep understanding of financial data operations.

Key Responsibilities

Develop, optimize, and maintain PySpark-based data pipelines for large-scale processing.
Work with Cloudera Data Platform components including Cloudera Manager, Hive, Impala, HDFS, and HBase.
Design and implement scalable ETL workflows aligned with banking data requirements.
Perform advanced data transformations using PySpark (RDDs, Data Frames, Spark SQL).
Support data warehousing initiatives and develop SQL-based analytics queries.
Integrate and work with big data tools such as Hadoop, Kafka, and distributed systems.
Use orchestration frameworks like Oozie or Airflow for workflow scheduling.
Develop automation scripts on Linux for deployments and workload management.
Ensure performance tuning, data quality, and compliance with financial industry standards.

Required Qualifications & Skills

Bachelor’s or Master’s in Computer Science, Data Engineering, IT, or related field.
5+ years of experience as a Data Engineer with strong banking domain exposure.
Advanced proficiency in PySpark (RDD, Data Frames, performance optimization).
Hands‑on experience with Cloudera Data Platform (CDP).
Strong SQL skills and experience with Hive/Impala-based data warehousing.
Solid understanding of Hadoop ecosystem tools and Kafka.
Experience with Oozie, Airflow, or similar orchestration frameworks.
Strong Linux scripting (Bash/Python) for automation.

Nice-to-Have Skills

Exposure to CI/CD concepts or Dev Ops practices.
Understanding of data governance in financial institutions.
Knowledge of Spark performance profiling or tuning at cluster level.
Experience working with cloud-based big data platforms.

Recruitment Pro Tip

Showcase end-to-end PySpark pipeline projects—especially those deployed on Cloudera within banking environments—as employers prioritize candidates who demonstrate real performance tuning, ETL optimization, and large-scale financial data processing expertise.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language