More jobs:
Kafka Operations Administrator
Job in
Seattle, King County, Washington, 98127, USA
Listed on 2026-06-02
Listing for:
Veriipro
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Cloud Computing, Cybersecurity, IT Support, Data Engineer
Job Description & How to Apply Below
We are seeking a highly skilled
Kafka Operations Administrator to manage and maintain production-grade Apache Kafka clusters. The ideal candidate will have deep experience in Kafka operations, monitoring, automation, and production support within enterprise environments. This role includes 24x7 on-call responsibilities, incident management, performance tuning, and ensuring high availability and disaster recovery.
- Deploy, configure, and manage Kafka clusters and related services to meet SLA requirements
- Participate in 24x7 on-call rotation, responding to incidents, alerts, and escalations
- Triage, diagnose, and remediate production incidents; coordinate with stakeholders, developers, and infrastructure teams
- Implement automation for provisioning, scaling, backups, and disaster recovery
- Maintain monitoring, alerting thresholds, dashboards, and Kafka ecosystem health
- Harden Kafka deployments by configuring TLS, ACLs, RBAC, encryption, and vulnerability remediation
- Perform routine maintenance including Kafka ecosystem upgrades (controllers, brokers, connect, and schema registry) and rolling restarts
- Create and maintain runbooks, automation scripts, and post-incident reports
- Optimize performance and resource utilization through benchmarking and tuning
- Support Kafka Connect and Schema Registry services; troubleshoot connector issues
- Contribute to CI/CD pipeline improvements for infrastructure and deployment automation
- Production-grade Apache Kafka operations experience, including managing, maintaining, and upgrading Kafka clusters
- Strong experience with high availability, disaster recovery, failover, and overall reliability
- Proficient in monitoring and observability tools, including:
- Grafana (dashboards)
- Prometheus
- Splunk
- JMX metrics
- Automation and orchestration expertise using:
- Terraform
- Ansible
- Helm
- Kubernetes (EKS/AKS/GKE)
- Strong Linux system administration, including troubleshooting and scripting for infrastructure management
- Production support experience following ITIL processes
- Experience in 24x7 on-call rotations, incident documentation, and postmortems
- Experience with JVM tuning, GC analysis, and network/disk I/O diagnostics
- Strong understanding of TCP/IP, routing, switching, and firewall configurations relevant to Kafka operations
- Deep Kafka performance tuning and capacity planning experience
- Knowledge of message delivery semantics and guarantees (at-least-once, exactly-once)
- Cloud-native security/compliance experience (IAM, VPC, KMS, Security Groups)
- Relevant certifications:
Confluent Certified Administrator, AWS/Azure/GCP - Experience with Apache Kafka in KRaft mode
- Containerization and orchestration experience (Docker, Kubernetes)
- CI/CD pipeline and Git-based workflows
- Experience building custom Kafka Connect libraries and knowledge of serialization formats (Avro, JSON)
- Strong understanding of networking across on-prem and cloud environments
- Best practices for topic management and streaming security (TLS, ACLs, RBAC, encryption)
- Kafka ecosystem tooling experience (Kafka Connect, Schema Registry)
- Bachelor’s degree in Computer Science, Engineering, or related field (preferred)
- 7+ years of experience in Kafka operations or platform engineering
- Proven experience in production support and infrastructure automation
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×