Lead Cloudera Consultant
Listed on 2026-02-07
-
IT/Tech
Data Engineer, Data Science Manager
Overview
Job Title: Lead Cloudera Consultant (Solution Architect) — Lead Cloudera Streaming Architect (CDP | NiFi | Kafka | Flink | Kudu | SSB)
Type: Contract through July 2026, potential to extend
Location: United States
Schedule: 100% remote
AboutThe Role
We are seeking a Lead Cloudera Streaming Architect with deep, hands-on experience across the Cloudera CDP streaming stack
, including Ni Fi ,
Kafka
, Flink
, Kudu/Impala
, and SQL Stream Builder (SSB). This is a highly technical, architecture-plus-implementation role responsible for designing, delivering, and optimizing mission-critical real-time data pipelines at enterprise scale. If you have personally built end-to-end CDP/CDF streaming pipelines and can execute complex ingestion, transformation, CDC, and Kudu write-path use cases on day one — this role is for you.
- Streaming Architecture & Implementation
- Architect and build real-time data pipelines using the full Cloudera Data Platform (CDP) streaming suite:
NiFi → Kafka → Flink → Kudu/Impala → SSB - Own architectural decisions, patterns, and best practices for streaming, CDC, state management, schema evolution, and exactly-once delivery
- Develop complex NiFi flows involving controller services (DBCP/JDBC), stateful processors, record processors, schema registry integrations, batch-to-stream conversions, and high-volume ingestion patterns
- Build and optimize Flink SQL or Data Stream API jobs with:
Kafka sources/sinks; event-time windows; watermarks; state management; checkpointing/save points; exactly-once guarantees - Design and tune Kudu tables (PKs, partitioning, distribution, upserts, deletes, merges)
- Build and deploy streaming SQL jobs using Cloudera SQL Stream Builder (SSB)
You must be able to deliver the following four core use cases immediately:
- NiFi → Snowflake → Impala/Kudu ingestion pipeline
- Kafka → Flink streaming (real-time processing)
- Flink → Kafka sink with exactly-once semantics
- CDC ingestion via NiFi, Flink CDC, or SSB (incremental keys, late events, deletes)
- Tune NiFi, Kafka, and Flink clusters for performance, throughput, and stability
- Implement schema governance, error handling, back-pressure strategies, and replay mechanisms
- Work closely with platform engineers to optimize CDP components and CDF deployments
- Provide architectural guidance, documentation, and mentorship to engineering teams
You must have hands-on, production-grade experience with ALL of the following:
- Cloudera CDP / CDF
- CDP Public Cloud or Private Cloud Base
- Cloudera Flow Management (NiFi + NiFi Registry)
- Cloudera Streams Messaging (Kafka, SMM)
- Cloudera Stream Processing (Flink, SSB)
- Kudu / Impala ecosystem
- Apache NiFi (Advanced)
- Building complex flows (not just admin/ops)
- Query Database Table / Generate Table Fetch / Merge Record
- Record-based processors & schema registry
- JDBC / DBCP controller services
- Stateful processors & incremental ingestion
- NiFi → Snowflake integration
- NiFi → Kudu ingestion patterns
- Apache Kafka
- Kafka brokers, partitions, retention, replication, consumer groups
- Schema registry (Avro/JSON)
- Designing topics for high-throughput streaming
- Apache Flink
- Flink SQL + Data Stream API
- Event-time processing, watermarks, windows
- Checkpointing, save points, state backends
- Kafka source/sink connectors
- Exactly-once semantics
- Flink CDC is a plus
- Apache Kudu
- Table design (PKs, partition strategies)
- Upserts, deletes, merge semantics
- Integration with Impala
- SQL Stream Builder (SSB)
- Creating jobs, connectors, materialized views
- Deploying and monitoring Flink SQL jobs in CDP
- CDC (Change Data Capture)
- CDC via NiFi or Flink CDC or SSB
- Handling late-arriving events
- Handling deletes, updates, schema evolution
- Incremental key tracking
- 8+ years in data engineering / streaming
- 3–5+ years specifically with CDP/CDF streaming
- Strong SQL and distributed system fundamentals
- Experience in financial services, healthcare, telecom, or other high-volume industries preferred
- Kubernetes experience running NiFi/Kafka/Flink operators
- Snowflake ingestion patterns (staging, Copy Into)
- Experience with Debezium
- CI/CD for data pipelines
- Security (Kerberos, Ranger, Atlas)
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).