More jobs:
Kafka Tier 3 Support Engineer
Job in
Canton, Norfolk County, Massachusetts, 02021, USA
Listed on 2026-04-17
Listing for:
Tata Consultancy Services
Full Time
position Listed on 2026-04-17
Job specializations:
-
IT/Tech
IT Support, Cloud Computing
Job Description & How to Apply Below
Key Responsibilities
- Tier 3 Incident Management & Escalation Support:
Act as the highest technical escalation point for Kafka production incidents (Sev 1 / Sev
2). - Lead deep troubleshooting across broker instability, controller elections, ISR shrinkage, under‑replicated partitions, leader imbalance, producer/consumer failures, lag spikes, rebalance storms, disk/network/JVM/request‑handler saturation.
- Provide hands‑on remediation for complex issues, including partition reassignment, leader rebalance, broker configuration tuning, and throttle/quota strategies for noisy producers or consumers.
- Coordinate with vendor support during service incidents, providing logs, metrics, and forensic details.
- Guide Tier 2 teams during major incidents and validate restoration actions.
- Kafka Performance Engineering & Optimization:
Analyze Kafka workloads for performance and scalability risks such as partition skew, hot partitions, inefficient producer batching/compression, consumer lag root causes, and thread‑pool, I/O, and network bottlenecks. - Recommend and validate topic design, producer/consumer configuration best practices, quotas, and multi‑tenant controls.
- Support onboarding of high‑throughput or latency‑sensitive workloads, ensuring Kafka is correctly sized and tuned.
- Platform Stability, Reliability & Resilience:
Diagnose and resolve systematic Kafka stability issues, support resilience initiatives (Multi‑AZ cluster health validation, replication/DR strategies, failover testing), and define/ improve Kafka SLOs. - Change, Upgrade & Configuration Leadership:
Lead medium to high‑risk Kafka changes, support and plan version upgrades, and participate in CAB reviews, assessing risk and designing rollback and validation plans. - Root Cause Analysis & Continuous Improvement:
Own RCA documentation for major incidents, recommend platform‑level improvements (automation, guardrails, monitoring enhancements), and contribute to runbooks and operability playbooks. - Mentorship &
Collaboration:
Provide technical guidance and mentoring to Tier 2 Kafka support teams; collaborate with application, platform, SRE, and security teams on capacity planning, best practices, and compliance.
- Strong hands‑on experience with Apache Kafka (brokers, producers, consumers).
- Experience supporting at least one of: AWS MSK, Confluent Platform / Confluent Cloud, or self‑managed Kafka on VM/Kubernetes.
- Deep understanding of brokers, partitions, replication, ISR, leader election, consumer groups and rebalancing, and producer/consumer internals and failure modes.
- Expertise in diagnosing consumer lag, throughput bottlenecks, broker disk/network/JVM performance, metadata and controller instability.
- Experience with monitoring and observability tools:
Kafka metrics, Cloud Watch, Prometheus, Grafana. - Knowledge of Kafka security concepts: TLS, authentication (IAM/SASL/SCRAM), ACLs/RBAC; principle of least privilege; experience in regulated or multi‑tenant environments.
- Preferred:
Kafka Connect, Schema Registry, or streaming frameworks; KRaft‑based deployments; AWS (preferred) or Azure/GCP; automation and IaC for Kafka operations; SRE or Dev Ops‑aligned experience.
Bachelor of Computer Science.
Benefits- Discretionary Annual Incentive.
- Comprehensive Medical Coverage:
Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans. - Family Support:
Maternal & Parental Leaves. - Insurance Options:
Auto & Home Insurance, Identity Theft Protection. - Convenience & Professional Growth:
Commuter Benefits & Certification & Training Reimbursement. - Time Off:
Vacation, Sick Leave & Holidays. - Legal & Financial Assistance:
Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
Salary Range $120,000-$140,000 per year.
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×