More jobs:
Senior AI & Data Engineering Lead - Senior Vice President
Job in
Jersey City, Hudson County, New Jersey, 07390, USA
Listed on 2026-06-18
Listing for:
Citi
Full Time
position Listed on 2026-06-18
Job specializations:
-
IT/Tech
Data Engineering
Job Description & How to Apply Below
This job description outlines a senior-level role for a data architect or lead data engineer within a Data Services team. The position is centered on building and managing the data infrastructure required to support large-scale generative AI and machine learning initiatives.
Expanded Responsibilities Strategic AI EnablementThis goes beyond just building databases; it’s about designing the entire data foundation for the company’s AI strategy.
- Data Ecosystem Architecture:
- Data Lake/Lakehouse Design: Implementing a central repository to store vast amounts of structured, semi-structured, and unstructured data from various sources. Technologies include AWS S3, Azure Data Lake Storage, or Google Cloud Storage.
- Federated Querying: Leveraging technologies like Starburst (commercial Trino) to create a virtual data warehouse. This allows data consumers to query data across different sources with a single SQL query, without needing to move or copy the data.
- Scalability and Performance: Ensuring the architecture can scale horizontally to handle petabytes of data and a high volume of concurrent queries, critical for pre-training large language models.
- High-Throughput Data Pipelines:
- Batch Processing:
Using Apache Spark for large-scale data transformation, cleaning, and feature engineering on historical data. - Real-time Stream Processing:
Using Apache Kafka as a messaging bus to ingest real-time data. Apache Flink is used for complex event processing on these streams.
- Batch Processing:
- Optimization and Reliability:
- Low Latency:
Tuning jobs and infrastructure to minimize the time data travels from source to destination. - High Availability:
Implement failover mechanisms, monitoring, and alerting to ensure pipelines are always running. - CI/CD for Data:
Implementing Dev Ops and AI Ops best practices for data pipelines, including automated testing, deployment, and data quality checks.
- Low Latency:
- Data Governance for AI:
- Data Quality:
Implement automated checks and monitoring to ensure data is accurate, complete, and consistent. - Data Provenance & Lineage:
Create systems to track where data comes from, how it has been transformed, and how it is used. - Data Security:
Work with security teams to implement access controls, data masking, and encryption to protect sensitive information.
- Data Quality:
- Team Leadership and Mentorship:
- Mentor Data Engineers:
Guide junior and mid-level engineers, conduct code reviews, and establish best practices for the team. - Foster Innovation:
Stay up-to-date with technologies and encourage a culture of experimentation and continuous improvement. - Cross-functional Collaboration:
Work closely with data scientists, ML engineers, platform engineers, and business stakeholders to understand their needs and deliver effective data solutions.
- Mentor Data Engineers:
- 10+ years of relevant experience
- Experience in implementing projects
- Experience in systems analysis and programming of software applications
- Demonstrated Subject Matter Expert (SME) in area(s) of Applications Development
- Demonstrated knowledge of client core business functions
- Demonstrated leadership, project management, and development skills
- Relationship and consensus building skills
- Bachelor’s degree/University degree or equivalent experience
- Master’s degree preferred
- Processing Frameworks:
Expert-level knowledge of Apache Spark, strong experience with Apache Flink and Apache Kafka. - Query Engines:
Deep understanding and hands‑on experience with Trino (Starburst). - Orchestration:
Experience with workflow management tools like Airflow or Prefect.
- Data Modeling:
Strong understanding of data modeling concepts for analytical and operational systems. - Platform Design:
Proven experience designing and building scalable data lakes, data warehouses, and lakehouse architectures. - Cloud Expertise:
Proficiency with at least one major cloud provider (AWS, GCP, Azure) and their data services.
- Data Governance:
Experience implementing data quality frameworks, data lineage solutions, and data cataloging tools. - Security:
Knowledge of data security best practices, encryption, masking, and role-based…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×