More jobs:
Associate Principal - Data Engineering
Job in
Cincinnati, Hamilton County, Ohio, 45208, USA
Listed on 2026-05-22
Listing for:
LTM
Full Time
position Listed on 2026-05-22
Job specializations:
-
Software Development
Data Engineer
Job Description & How to Apply Below
Job Description
Senior Developer – PySpark / Python Data Engineering
Primary
Skills:
pysparkpython Developer
Location: India, Global Delivery Center Regional Hub
Industry: Multi National FMCG
Cloud Strategy: Hyperscaler First Azure GCP AWS with Databricks Delta Lake
Key Responsibilities- PySpark Development Primary Focus
- Design and develop production‑grade PySpark applications for large‑scale batch and streaming data processing.
- Implement advanced PySpark Data Frame API operations:
- Complex transformations – Window functions, Pivot/Unpivot and nested struct handling.
- Multidataset joins – Broadcast joins, Sort Merge joins and skew‑handling strategies.
- Custom UDFs – User‑Defined Functions, Pandas UDFs, Vectorized UDFs for performance‑critical transformations.
- Aggregations and Group By operations optimized for large FMCG datasets.
- Implement PySpark Structured Streaming for realtime data processing:
- Streaming sources – Kafka, Azure Event Hubs, GCP Pub Sub.
- Watermarking and windowing strategies for late‑arriving data.
- Stateful streaming operations using mapGroups With State .
- Exactly‑once and at‑least‑once delivery semantics.
- Apply advanced Spark performance tuning techniques:
- Partition optimization – repartition vs coalesce strategies.
- Handling data skew using salting and custom partitioners.
- Broadcast variable management and accumulator usage.
- Catalyst optimizer hints and AQE Adaptive Query Execution tuning.
- Executor sizing, memory fractions and parallelism configuration.
- Develop and maintain reusable PySpark libraries for shared data processing capabilities.
- Python Engineering Primary Focus
- Build Python‑based data services automation scripts and utility frameworks supporting the data platform.
- Develop REST API integrations using Python requests/httpx for consuming SAP OData, Salesforce and third‑party FMCG APIs.
- Implement data validation and reconciliation frameworks using Python Great Expectations, Pandera.
- Build Python‑based orchestration scripts and helper utilities for Airflow DAGs and Databricks Workflows.
- Apply software engineering best practices:
- Unit testing with pytest and integration testing with Testcontainers.
- Type hints, docstrings and modular design patterns.
- Virtual environments, dependency management (Poetry/pip) and packaging.
- Implement Python‑based data quality checks for completeness, consistency and conformity validations.
- Data Lakehouse Cloud Platform Primary Focus
- Build and manage Data Lakehouse architectures on hyperscaler platforms (Azure Databricks, GCP Dataproc, AWS EMR).
- Utilize Delta Lake, Apache Iceberg, Apache Hudi for ACID‑compliant data lake storage.
- Implement Medallion Architecture – Bronze/Silver/Gold for progressive data refinement.
- Use ACID transactions, schema enforcement, time travel, Optimize and ZOrder, Change Data Feed (CDF) for incremental data propagation.
- Manage Databricks Workflows and Job Clusters for production pipeline execution.
- Implement Databricks Auto Loader for incremental scalable data ingestion.
- Utilize Unity Catalog for data governance, lineage and access control.
- Data Ingestion Integration
- Build data ingestion pipelines from diverse FMCG data sources.
- Sources include SAP S/4
HANA OData APIs, BAPI extracts, IDoc feeds;
Salesforce REST API, Bulk API, Platform Events;
Operational Databases – Oracle Cloud, SQL Azure, Cloud Spanner;
Streaming sources – Apache Kafka, Azure Event Hubs, GCP Pub Sub;
File‑based sources – SFTP, Azure Blob, GCS, S3, CSV, Parquet, Avro, JSON. - Implement Change Data Capture (CDC) patterns for realtime database synchronization.
- Design schema evolution strategies to handle upstream data model changes gracefully.
- Publish processed data to downstream consumers – Big Query, Azure Synapse, Snowflake, Power BI, Looker, Feature Stores (Feast, Databricks).
- SQL Data Modeling
- Write and optimize complex SQL queries for data extraction, transformation and validation.
- Design data warehouse schemas – Star and Snowflake models for FMCG analytics domains.
- Implement Spark SQL for large‑scale analytical query processing.
- Develop data quality SQL checks and reconciliation frameworks.
- Optimize SQL performance – Query plans, partition pruning, predicate pushdown.
- Comprehensive…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×