Data Lakehouse Architect Job Middletown area,Rhode Island USA,IT/Tech

SEA CORP

Location: Middletown, RI, United States

Req : req
1806

SEACORP is seeking a well-qualified Data Lakehouse Architect
.

Primary Duties and Responsibilities:

Job Summary: SEACORP is seeking a Data Lakehouse Architect to lead the design, implementation, and evolution of a modern, tiered data platform that supports scalable ingestion, storage, processing, governance, and analytics. This position is in support of our SWFTS Data Strategy and Data Pipeline program. This role will define the target‑state architecture for a lakehouse environment built on technologies including Kafka, Apache Iceberg, Amazon S3, CEPH, and Trino, while ensuring the platform is secure, performant, reliable, and cost‑effective.

The architect will partner with engineering, platform, analytics, security, and business teams to establish architectural standards, guide implementation, and enable high‑quality data products across batch and streaming domains. The ideal candidate combines deep technical expertise in distributed data systems with strong design judgment, leadership, and the ability to translate business requirements into durable platform capabilities.

Job

Responsibilities Include:

Design and document lakehouse architecture using Kafka for streaming ingestion, Iceberg for table format and data management, S3 and/or CEPH for object storage, and Trino for distributed SQL query access.
Define architecture for data partitioning, compaction, schema evolution, metadata management, table maintenance, and lifecycle policies.
Architect data ingestion frameworks for both real‑time and batch workloads, including event‑driven and CDC‑based integration patterns.
Establish scalable, resilient, and secure storage patterns across cloud and on‑premises or hybrid object storage environments.
Define governance patterns including access control, encryption, data retention, lineage, auditability, and compliance integration.
Partner with data engineers to optimize query performance, file sizing, partitioning strategy, and workload concurrency in Trino and related engines.
Lead engineering teams and review designs, code, and deployment approaches for alignment with target architecture.

Qualifications:

Education: Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field.

Required Experience: Required knowledge of Atlassian Tool Suite, Git, and Linux. Preferred knowledge in C++, Java, Python, Linux. Candidate should have the ability to work in a fast‑paced work environment. Able to collaborate with others while being able to handle independent tasking. Ability to learn new technologies quickly.

7+ years of experience in data engineering, data architecture, or platform architecture roles. 3+ years of experience designing and implementing modern data lake or lakehouse architectures in production environments.
Hands‑on experience with Apache Kafka for streaming data ingestion, event architecture, or real‑time data integration.
Hands‑on experience with Apache Iceberg or a similar open table format in large‑scale analytical environments.
Experience designing data platforms on object storage, including Amazon S3, CEPH, or equivalent S3‑compatible storage systems.
Experience with Trino or similar distributed SQL query engines for interactive analytics over large datasets.
Strong understanding of distributed systems principles, including scalability, fault tolerance, consistency tradeoffs, and performance tuning. Experience with data modeling, schema design, partitioning strategy, and optimization for analytical workloads.
Experience with security architecture including role‑based access control, encryption, and data governance controls.
Experience creating architecture documentation, technical standards, and implementation roadmaps. Strong knowledge of batch and streaming pipeline patterns, including CDC, event‑driven design, and ingestion orchestration.

Desired

Experience:

Desired knowledge in the areas of Databases, SQL and No‑SQL (Postgres, Mongo

DB), Apache Data Frameworks (Kafka, Spark, Iceberg, Open Metadata, Ranger), Data Infrastructure (Ceph, S3, MinIO/Parquet, REST, Nessie, Druid), Data APIs (Trino, Metabase,…