Senior Data Engineer - Vice President
Listed on 2026-06-02
-
IT/Tech
Data Engineer, Big Data, Data Science Manager
Citi is seeking a highly skilled and experienced Senior Data Engineer to join our dynamic and innovative technology team. The ideal candidate will have a robust background in data engineering, with deep expertise in a variety of modern data technologies and a proven track record of working on large-scale data projects. This role will be pivotal in designing, building, and optimizing our data infrastructure on cloud platforms, and will also provide exposure to cutting‑edge Artificial Intelligence projects, including Retrieval‑Augmented Generation (RAG) and Agentic AI systems.
The candidate must be proficient in Agile methodologies and possess strong leadership and client‑facing skills to guide projects to successful completion while balancing stakeholder needs and organizational goals.
- Design, build, and maintain scalable ETL/ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks, ensuring efficient ingestion, transformation, and integration of large‑scale datasets across cloud platforms.
- Implement and manage data solutions on cloud platforms (e.g., AWS, GCP, Azure). Leverage cloud‑native services for data storage, processing, and analytics.
- Work extensively with big data frameworks and platforms such as Databricks, Snowflake, and open table formats like Apache Iceberg to process and analyze petabyte‑scale datasets.
- Optimize Spark workloads and Databricks clusters by tuning jobs, managing partitioning strategies, caching, and autoscaling to improve performance, reduce processing time, and control infrastructure costs.
- Implement and manage Lakehouse architecture using Delta Lake, enforcing data quality, schema evolution, and governance (e.g., Unity Catalog), while ensuring reliable, secure, and high‑quality data for analytics and downstream applications.
- Lead the design and architecture of Starburst‑based data solutions, ensuring scalability, performance, and reliability for enterprise‑level data platforms.
- Implement and manage data federation strategies using Starburst connectors to seamlessly integrate and query data across disparate systems (e.g., Data Lakes, RDBMS, No
SQL databases, Cloud Storage). - Identify and resolve performance bottlenecks in data pipelines and queries. Optimize data storage and processing for cost and efficiency.
- Develop and optimize robust data pipelines with a strong focus on data governance, ensuring high data quality, comprehensive data lineage, and efficient, compliant data flow from ingestion to consumption for analytical and operational needs.
- Design and implement data models that support business intelligence, analytics, and machine learning use cases. Ensure data architecture is robust, scalable, and secure.
- Partner with data scientists and AI specialists to support the development and deployment of AI models. Contribute to innovative projects involving RAG and Agentic AI by providing the necessary data infrastructure and support.
- Operate effectively within an Agile development environment, actively participating in sprint planning, daily stand‑ups, and retrospectives to ensure iterative and timely delivery of project milestones.
- Provide technical leadership to steer the project in the right direction, making critical decisions that align with both client interests and the organization’s strategic benefits. Mentor junior engineers and promote best practices.
- Serve as a key point of contact for stakeholders and clients. Effectively communicate project progress, manage expectations, and translate complex business requirements into actionable technical tasks.
- Python:
Expert‑level proficiency with Python and its data ecosystem (Pandas, Num Py, Dask). Experience includes writing production‑grade code for data processing, automation, and API development. - PySpark:
Extensive hands‑on experience with the Spark framework, including deep knowledge of the Data Frame API, Spark SQL, and performance tuning techniques for distributed data processing. - Databricks:
Proven experience developing on the Databricks Lakehouse Platform, including proficiency with Delta Lake, structured streaming, and optimizing Spark jobs within the Databricks…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).