×
Register Here to Apply for Jobs or Post Jobs. X

Kafka Tier 3 Support Engineer

Job in Canton, Norfolk County, Massachusetts, 02021, USA
Listing for: TATA Consulting Services
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    Cybersecurity, Data Security, IT Support, Cloud Computing
Salary/Wage Range or Industry Benchmark: 120000 - 140000 USD Yearly USD 120000.00 140000.00 YEAR
Job Description & How to Apply Below
Must Have Technical/Functional Skills

Kafka & Streaming

* Strong hands on experience with Apache Kafka

* Experience supporting at least one of:

o AWS MSK

o Confluent Platform / Confluent Cloud

o Self managed Kafka (VM or Kubernetes)

* Deep understanding of:

o Brokers, partitions, replication, ISR, leader election

o Consumer groups and rebalancing

o Producer/consumer internals and failure modes

Operations & Performance

* Expertise in diagnosing:

o Consumer lag and throughput bottlenecks

o Broker disk, network, and JVM performance

o Metadata and controller instability

* Experience with monitoring and observability tools (Kafka metrics, Cloud Watch, Prometheus, Grafana, etc.)

Security & Governance

* Knowledge of Kafka security concepts:

o TLS, authentication (IAM/SASL/SCRAM), ACLs/RBAC

o Principle of least privilege

* Experience supporting regulated or multi tenant environments

Preferred / Nice to Have Skills

* Experience with Kafka Connect, Schema Registry, or streaming frameworks

* Exposure to KRaft-based Kafka deployments

* Cloud platforms (AWS preferred; Azure/GCP beneficial)

* Automation and IaC experience for Kafka operations

* Experience in SRE or Dev Ops-aligned environments

Roles & Responsibilities

Key Responsibilities

1. Tier 3 Incident Management & Escalation Support

* Act as the highest technical escalation point for Kafka production incidents (Sev 1 / Sev
2).

* Lead deep troubleshooting across:

o Broker instability, controller elections, ISR shrinkage

o Under replicated partitions and leader imbalance

o Producer/consumer failures, lag spikes, and rebalance storms

o Disk, network, JVM, and request handler saturation

* Provide hands on remediation for complex issues, including:

o Partition reassignment and leader rebalance

o Broker configuration tuning

o Throttle/quota strategies for noisy producers or consumers

* Coordin ate with vendor support during service incidents, providing logs, metrics, and forensic details.

* Guide Tier 2 teams during major incidents and validate restoration actions.

2. Kafka Performance Engineering & Optimization

* Analyze Kafka workloads for performance and scalability risks:

o Partition skew and hot partitions

o Inefficient producer batching/compression

o Consumer lag root cause analysis

o Thread pool, I/O, and network bottlenecks

* Recommend and validate:

o Topic design (partition count, replication factor, retention, compaction)

o Producer and consumer configuration best practices

o Quotas, quotas enforcement, and multi tenant controls

* Support onboarding of high throughput or latency sensitive workloads, ensuring Kafka is correctly sized and tuned.

3. Platform Stability, Reliability & Resilience

* Diagnose and resolve systemic Kafka stability issues:

o Repeated broker failures or flapping

o Metadata/controller instability (Zookeeper or KRaft)

o Recovery issues following failovers or maintenance events

* Support resilience initiatives:

o Multi AZ cluster health validation

o Replication and DR strategies (Mirror Maker 2, Replicator, or app level DR patterns)

o Failover testing and validation

* Define and improve Kafka SLOs for availability, durability, and latency.

4. Change, Upgrade & Configuration Leadership

* Lead medium to high risk Kafka changes, including:

o Broker and cluster configuration changes

o Partition expansion or large scale reassignment

o Topic policy changes impacting durability or performance

* Support and plan:

o Kafka version upgrades

o MSK / Confluent upgrade cycles

o Client compatibility and rollout strategies

* Participate in CAB reviews, assess risk, and design rollback and validation plans.

5. Root Cause Analysis & Continuous Improvement

* Own RCA documentation for major incidents with clear corrective and preventive actions (CAPA).

* Identify recurring failure patterns and architectural gaps.

* Re commend platform-level improvements:

o Automation opportunities

o Guardrails and standards

o Monitoring and alerting enhancements

* Contribute to continuous improvement of runbooks, knowledge base articles, and operational playbooks.

6. Mentorship & Collaboration

* Provide technical guidance and mentoring to Tier 2 Kafka support teams.

* Collaborate with:

o Application teams on Kafka client usage and best practices

o Platform and SRE teams on capacity planning and reliability engineering

o Security teams on access control, encryption, and compliance requirements

Act as a subject matter expert for Kafka within the organization.

Salary Range $120,000-$140,000 years

TCS Employee Benefits

Summary:

Discretionary Annual Incentive.

Comprehensive Medical Coverage:
Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.

Family Support:
Maternal & Parental Leaves.

Insurance Options:
Auto & Home Insurance, Identity Theft Protection.

Convenience & Professional Growth:
Commuter Benefits & Certification & amp;
Training Reimbursement.

Time Off:
Vacation, Time Off, Sick Leave & Holidays.

Legal & Financial Assistance:
Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.

#LI-SP1
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary