Data Lakehouse Architect
Listed on 2026-06-05
-
IT/Tech
Data Engineer
SEACORP is seeking a well-qualified
Data Lakehouse Architect
.
Job Summary: SEACORP is seeking a Data Lakehouse Architect to lead the design, implementation, and evolution of a modern, tiered data platform that supports scalable ingestion, storage, processing, governance, and analytics. This position is in support of our SWFTS Data Strategy and Data Pipeline program. This role will define the target-state architecture for a lakehouse environment built on technologies including Kafka, Apache Iceberg, Amazon S3, CEPH, and Trino, while ensuring the platform is secure, performant, reliable, and cost-effective.
The architect will partner with engineering, platform, analytics, security, and business teams to establish architectural standards, guide implementation, and enable high‑quality data products across batch and streaming domains. The ideal candidate combines deep technical expertise in distributed data systems with strong design judgment, leadership, and the ability to translate business requirements into durable platform capabilities.
Job Responsibilities Include- Design and document lakehouse architecture using Kafka for streaming ingestion, Iceberg for table format and data management, S3 and/or CEPH for object storage, and Trino for distributed SQL query access.
- Define architecture for data partitioning, compaction, schema evolution, metadata management, table maintenance, and lifecycle policies.
- Architect data ingestion frameworks for both real‑time and batch workloads, including event‑driven and CDC‑based integration patterns.
- Establish scalable, resilient, and secure storage patterns across cloud and on‑premises or hybrid object storage environments.
- Define governance patterns including access control, encryption, data retention, lineage, auditability, and compliance integration.
- Partner with data engineers to optimize query performance, file sizing, partitioning strategy, and workload concurrency in Trino and related engines.
- Lead engineering teams and review designs, code, and deployment approaches for alignment with target architecture.
Education
Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related technical field.
Required ExperienceRequired knowledge of Atlassian Tool Suite, Git, and Linux. Preferred knowledge in C++, Java, Python, Linux. Candidate should have the ability to work in a fast‑paced work environment. Able to collaborate with others while being able to handle independent tasking. Ability to learn new technologies quickly.
- 7+ years of experience in data engineering, data architecture, or platform architecture roles. 3+ years of experience designing and implementing modern data lake or lakehouse architectures in production environments.
- Hands‑on experience with Apache Kafka for streaming data ingestion, event architecture, or real‑time data integration.
- Hands‑on experience with Apache Iceberg or a similar open table format in large‑scale analytical environments.
- Experience designing data platforms on object storage, including Amazon S3, CEPH, or equivalent S3‑compatible storage systems.
- Experience with Trino or similar distributed SQL query engines for interactive analytics over large datasets.
- Strong understanding of distributed systems principles, including scalability, fault tolerance, consistency tradeoffs, and performance tuning. Experience with data modeling, schema design, partitioning strategy, and optimization for analytical workloads.
- Experience with security architecture including role‑based access control, encryption, and data governance controls.
- Experience creating architecture documentation, technical standards, and implementation roadmaps. Strong knowledge of batch and streaming pipeline patterns, including CDC, event‑driven design, and ingestion orchestration.
- Master’s degree in Computer Science, Data Engineering, Distributed Systems, or a related field.
- Experience with Team Submarine, SWFTS, US Navy program offices, TI/APB cycle.
- Experience with metadata catalogs such as Hive Metastore, AWS Glue Catalog, Nessie, or Polaris.
- Familiarity with data processing engines such as…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).