Snr Kafka Platform Engineer
Job in
Chicago, Cook County, Illinois, 60290, USA
Listed on 2026-05-08
Listing for:
Benton Partners
Full Time
position Listed on 2026-05-08
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, Data Engineer
Job Description & How to Apply Below
Senior Kafka Platform Engineer (Automation & Kubernetes)
Chicago New York
Posted 54 Days Ago
REQ
7777
We’reseeking a seasoned Kafka engineer to design,operate, and scale our event streaming platform.
You’llown the Kafka core (brokers, storage, security, observability) and the automation that powers it—building infrastructure-as-code, operators/Helm charts, and CI/CD to enable safe, self-service provisioning.
You’llrun Kafka on Kubernetes and/or cloud-managed offerings, ensure reliability and performance, and partner with application teams on best practices.
- Architect, deploy, andoperateproduction-grade Kafka clusters (self-managed and/or Confluent/MSK), including upgrades, capacity planning, multi-AZ/region DR, and performance tuning.
- Operate Kafka on Kubernetes using Operators, Helm, andGitOps, and build
IaC-driven automation with guardrails for repeatable, compliant, zero-downtime provisioning and deployments. - Implement and manage Kafka Connect, Schema Registry, andMirror
Maker2/Cluster Linking; standardize connectors (e.g.,Debezium) and build self-service patterns. - Drive reliability: define SLOs/error budgets, on-call rotations, incident response, postmortems, runbooks, and automated remediation.
- Implement observability: metrics, logs, traces, lag monitoring, and capacity dashboards (e.g., Prometheus/Grafana, Burrow, Cruise Control,Open Telemetry).
- Secure the platform: TLS/mTLS, SASL (OAuth/SCRAM), RBAC/ACLs,secrets management, network policies, audit, and compliance automation.
- Guide event-streaming best practices: topic design, partitioning, compaction/retention, idempotency, ordering, schema evolution/compatibility, DLQs, EOS semantics.
- Partner with app, data, and SRE teams; provide enablement, documentation, and internal tooling for a great developer experience.
- Lead/mentor engineers and contribute to roadmap, standards, and platform strategy.
- Excellent communication and partnership skills with platform and application teams.
- Deep hands-on experience operating Kafka in production at scale (brokers, controllers, partitions, ISR, tiered storage/retention, rebalancing, replication, recovery).
- Strong Kubernetesexpertiserunning stateful systems.
- Automation first:
Infrastructure as Code (Terraform), Helm, Operators,Git Ops(Argo CD/Flux), and CI/CD (e.g., Git Hub Actions/Jenkins) for platform lifecycle. - Proficiency with one or more languages for tooling/automation:
Python, Go, or Java;plus
Bash and solid Linux fundamentals (networking, file systems, JVM tuning basics). - Observability and reliability engineering for Kafka:
Prometheus/Grafana, logging, alerting, lag monitoring, capacity/throughput modeling, performance tuning. - Security for data in motion: TLS/mTLS, SASL/OAuth, ACL/RBAC,secrets management (e.g., Vault), and audit/compliance practices.
- Experience with Kafka ecosystem components:
Kafka Connect, Schema Registry,Mirror
Maker2/Cluster Linking; familiarity with Cruise Control. - Cloud experience (AWS/Azure/GCP) with networking, IAM, and one or more managed offerings (e.g., Confluent Cloud or AWS MSK).
- Proven track record designing runbooks, leading incidents/postmortems, and driving platform roadmaps.
- Data processing frameworks (Kafka Streams, Flink, Spark Structured Streaming) and EOS semantics.
- Experience with Strimzior Confluent for Kubernetes in production.
- Knowledge of CDC patterns and tools (e.g.,Debezium) and database connectors at scale.
- Multi-region architectures, cluster linking strategies, and disaster recovery drills.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×