Apache Iceberg Engineer Job Sunnyvale area,California USA,IT/Tech

We are looking for an experienced Apache Iceberg Engineer to design, develop, and optimize large-scale data lakehouse solutions leveraging Apache Iceberg. The ideal candidate will have expertise in big data processing frameworks (Apache Spark, Flink, Presto, Trino, Hive) and cloud-based data platforms like AWS S3, Google Cloud Storage, or Azure Data Lake Storage. You will work closely with data engineers, data scientists, and Dev Ops teams to ensure efficient, scalable, and reliable data architecture.

Key Responsibilities:

Design, implement, and optimize Iceberg-based data lake architectures for large-scale datasets.
Develop data ingestion, transformation, and query optimization pipelines using Spark, Flink, or Presto/Trino.
Ensure ACID compliance, schema evolution, and partition evolution in Iceberg tables.
Implement time travel, versioning, and snapshot management for historical data analysis.
Optimize metadata management and query performance in Iceberg-based data lakes.
Integrate Apache Iceberg with cloud storage solutions (AWS S3, GCS, ADLS) and data warehouses.
Implement best practices for data governance, access control, and security within an Iceberg-based environment.
Troubleshoot performance issues, metadata inefficiencies, and schema inconsistencies in Iceberg tables.
Collaborate with Dev Ops, ML engineers, and BI teams to enable smooth data workflows.

Required Qualifications:

Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
3+ years of experience in Big Data, Data Engineering, or Cloud Data Warehousing.
Hands-on experience with Apache Iceberg in a production environment.
Strong expertise in Apache Spark, Flink, Trino, Presto, or Hive for big data processing.
Proficiency in SQL and distributed query engines.
Experience working with cloud storage solutions (AWS S3, GCS, ADLS).
Knowledge of data lakehouse architectures and modern data management principles.
Familiarity with schema evolution, ACID transactions, and partitioning techniques.
Experience with Python, Scala, or Java for data processing.

Preferred Qualifications:

Experience in real-time data processing using Flink or Kafka.
Understanding of data governance, access control, and compliance frameworks.
Knowledge of other data lake frameworks like Delta Lake (Databricks) or Apache Hudi.
Hands-on experience with Terraform, Kubernetes, or Airflow for data pipeline automation.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Other

Industries

Software Development and IT Services and IT Consulting

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language