Job Details
- Job Title: Data Engineer (PySpark) | Innovations Global | Dubai, UAE
- Recruiting Company: Innovations Global
- Job Location: Dubai, United Arab Emirates (Onsite)
- Job Type: Full-Time
- Application Method: sp.ha
- Notice Period: Immediate to 30 Days
- Experience
Required:
5+ Years - Industry Requirement: Banking Domain (Mandatory)
The Data Engineer will design, build, and optimize large-scale data pipelines using PySpark and Cloudera Data Platform within a fast-paced banking environment. This role is key to enabling reliable data processing, analytics, and ETL workflows that power mission‑critical financial applications.
Job DescriptionAs a Data Engineer, you will work with massive datasets on Cloudera Data Platform, leveraging advanced PySpark techniques to deliver efficient and scalable data transformations. You will collaborate with data architects, business analysts, and engineering teams to build robust ETL pipelines, optimize cluster and workload performance, and support key banking initiatives. This role requires strong experience with CDP components, big data frameworks, and orchestration tools, alongside excellent Linux scripting skills and a deep understanding of financial data operations.
Key Responsibilities- Develop, optimize, and maintain PySpark-based data pipelines for large-scale processing.
- Work with Cloudera Data Platform components including Cloudera Manager, Hive, Impala, HDFS, and HBase.
- Design and implement scalable ETL workflows aligned with banking data requirements.
- Perform advanced data transformations using PySpark (RDDs, Data Frames, Spark SQL).
- Support data warehousing initiatives and develop SQL-based analytics queries.
- Integrate and work with big data tools such as Hadoop, Kafka, and distributed systems.
- Use orchestration frameworks like Oozie or Airflow for workflow scheduling.
- Develop automation scripts on Linux for deployments and workload management.
- Ensure performance tuning, data quality, and compliance with financial industry standards.
- Bachelor’s or Master’s in Computer Science, Data Engineering, IT, or related field.
- 5+ years of experience as a Data Engineer with strong banking domain exposure.
- Advanced proficiency in PySpark (RDD, Data Frames, performance optimization).
- Hands‑on experience with Cloudera Data Platform (CDP).
- Strong SQL skills and experience with Hive/Impala-based data warehousing.
- Solid understanding of Hadoop ecosystem tools and Kafka.
- Experience with Oozie, Airflow, or similar orchestration frameworks.
- Strong Linux scripting (Bash/Python) for automation.
- Exposure to CI/CD concepts or Dev Ops practices.
- Understanding of data governance in financial institutions.
- Knowledge of Spark performance profiling or tuning at cluster level.
- Experience working with cloud-based big data platforms.
Showcase end-to-end PySpark pipeline projects—especially those deployed on Cloudera within banking environments—as employers prioritize candidates who demonstrate real performance tuning, ETL optimization, and large-scale financial data processing expertise.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).