Senior Data Engineer - Vice President Job Irving area,Texas USA,IT/Tech

Responsibilities

Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks, ensuring efficient ingestion, transformation, and integration of large‑scale datasets across cloud platforms.
Implement and manage data solutions on cloud platforms (AWS, GCP, Azure), leveraging cloud‑native services for data storage, processing, and analytics.
Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg to process and analyze petabyte‑scale datasets.
Optimize Spark workloads and Databricks clusters by tuning jobs, managing partitioning strategies, caching, and autoscaling to improve performance, reduce processing time, and control infrastructure costs.
Implement and manage Lakehouse architecture using Delta Lake, enforcing data quality, schema evolution, and governance (e.g., Unity Catalog), while ensuring reliable, secure, and high‑quality data for analytics and downstream applications.
Lead the design and architecture of Starburst‑based data solutions, ensuring scalability, performance, and reliability for enterprise‑level data platforms.
Implement and manage data federation strategies using Starburst connectors to seamlessly integrate and query data across disparate systems (Data Lakes, RDBMS, No

SQL databases, Cloud Storage).
Identify and resolve performance bottlenecks in data pipelines and queries, optimizing data storage and processing for cost and efficiency.
Develop and optimize robust data pipelines with a strong focus on data governance, ensuring high data quality, comprehensive data lineage, and efficient compliant data flow from ingestion to consumption for analytical and operational needs.
Design and implement data models that support business intelligence, analytics, and machine learning use cases, ensuring architecture is robust, scalable, and secure.
Partner with data scientists and AI specialists to support the development and deployment of AI models, contributing to projects involving Retrieval‑Augmented Generation and Agentic AI systems by providing necessary data infrastructure and support.
Operate effectively within an Agile development environment, actively participating in sprint planning, daily stand‑ups, and retrospectives to ensure iterative and timely delivery of project milestones.
Provide technical leadership to steer projects toward success, making critical decisions that align with client interests and organizational goals, while mentoring junior engineers and promoting best practices.
Serve as a key point of contact for stakeholders and clients, effectively communicating project progress, managing expectations, and translating complex business requirements into actionable technical tasks.

Core Data Technologies

Python:
Expert‑level proficiency with the Python data ecosystem (Pandas, Num Py, Dask) and production‑grade code for data processing, automation, and API development.
PySpark:
Extensive experience with the Spark framework, deep knowledge of the Data Frame API, Spark SQL, and performance‑tuning techniques for distributed data processing.
Databricks:
Proven experience developing on the Databricks Lakehouse Platform, including Delta Lake, structured streaming, and Spark job optimization.
Ab Initio:
Practical experience with the Ab Initio suite (GDE, Co>

Operating System, Conduct>

It) designing enterprise‑grade ETL workflows.
Snowflake:
Hands‑on experience building and maintaining data warehouses, data modelling, RBAC security, performance tuning, and features such as Snowpipe and Time Travel.
Starburst/Trino:
Experience using federated query engines to provide unified access across disparate data sources.
Apache Iceberg:
Familiarity with open table formats for managing large analytic datasets.
Major cloud provider:
In‑depth, multi‑year experience with at least one of AWS, Google Cloud Platform, or Azure.
Cloud‑native services:
Building and managing data pipelines using services such as AWS Glue, Lambda, S3, Redshift;
Azure Data Factory, Synapse Analytics; or Google Cloud Composer, Dataflow, Big Query.
Data lifecycle for ML:
Solid understanding of the data lifecycle…