Data Engineer Job Phoenix area,Arizona USA,IT/Tech

Data Engineer with strong PySpark experience to work on large-scale data processing and analytics initiatives. The ideal candidate will have hands‑on experience working with large datasets, complex joins, and performance optimization
, along with the ability to apply basic analytical thinking and deliver clear, stakeholder‑ready outputs
.

Key Responsibilities

Design, develop, and maintain scalable data pipelines using Py Spark .
Write efficient and optimized PySpark code to process and transform large‑scale datasets
.
Handle joins across multiple large databases
, ensuring performance, accuracy, and scalability.
Optimize Spark jobs to minimize runtime, memory usage, and compute cost
.
Work with structured and semi‑structured data from multiple sources.

Data Preparation & Analysis Support

Build and curate training and analytical datasets by joining and transforming multiple data sources.
Apply basic analytical skills to understand data patterns, anomalies, and business relevance.
Perform data validation and quality checks
:
- Record counts and reconciliation
- Null and outlier checks
- Schema and data‑type validation
Ensure datasets are analysis‑ready and trustworthy
.

Stakeholder Interaction & Reporting

Understand business objectives and translate them into data requirements.
Ask the right questions to determine:
- Level of aggregation required
- Data freshness and accuracy expectations
- Preferred output and reporting formats
Present results and insights clearly to stakeholders.
Create reports and summaries using Excel for business users and leadership.

Expected Technical Approach (Problem‑Solving Mindset)

Candidates are expected to demonstrate the ability to:

Approach complex data projects methodically, starting with:
- Understanding business objectives
- Reviewing source data structure and volume
- Designing efficient join strategies
- Choosing the right join types, partitioning strategies, and caching techniques
- Validating data at every stage of the pipeline
- Balancing technical accuracy with business usability when presenting results

Core Skill Sets (Must‑Have)

Strong hands‑on experience with Py Spark
Extensive experience working with large datasets
Proven expertise in joining large databases efficiently
Ability to write high‑performance, optimized code
Basic analytical skills to interpret and validate data

Good to Have Skills

Experience in model development or supporting analytics/modeling teams
SAS experience
Exposure to Cloudera or similar big data platforms
Understanding of data warehousing and analytics workflows

Strong problem‑solving and logical thinking

#J-18808-Ljbffr