Software Engineer, Cloud Engineer - Software, DevOps
Listed on 2026-06-05
-
Software Development
Cloud Engineer - Software, DevOps
Staff Software Engineer – Enterprise Event Streaming
The Staff Software Engineer provides technical leadership for complex, scalable systems. This role drives architecture, code quality, and reliability across critical services while mentoring engineers and aligning solutions with business outcomes. The engineer partners with product, security, and operations to design resilient platforms, reduce risk, and accelerate delivery. Success requires hands‑on development, thoughtful tradeoffs, and clear communication that advances engineering standards and unlocks team effectiveness.
Location:
Nashville, TN or Sterling, VA
- Platform Operations & Modernization:
Design, build, and operate Kafka-based streaming services and stream-processing applications running 24x7 in multi-cloud production. Lead an end-to-end stream of the platform modernization initiative – sequence the cutover, prove equivalence, and partner with consumer teams in the blast radius. - Event Governance & Architecture:
Lead system architecture and evolve event governance, including topic conventions, access control, encryption posture, schema/contract evolution, and Git Ops tooling. Author written design records for non-trivial decisions and contribute to architectural reviews. - Developer
Experience:
Improve DX for internal customers through client libraries, self-serve tooling, and onboarding automation that lets a new team start producing without filing a ticket. - Technical Leadership & Quality:
Guide engineers through complex implementation decisions, elevate code quality with rigorous reviews, and mentor team members with constructive feedback. - Reliability & Incident Management:
Improve system reliability through observability and automation. Carry the on-call rotation for the full platform, diagnosing unfamiliar failure modes and authoring actionable runbooks.
- Kotlin
- Java
- Spring Boot
- Apache Kafka
- Apache Flink
- AWS
- GCP
- Kubernetes
- Terraform
- Type Script
This is a platform engineering role. The on‑call rotation covers the entire platform we operate today – services, stream-processing apps, connectors, client libraries, and operational tooling that pre‑date your joining. Candidates who are not comfortable reading code to diagnose unfamiliar failure modes, authoring runbooks, and being a first responder for production systems will not be a fit.
Minimum Qualifications- Education:
Bachelor’s degree in Computer Science, a related technical field, or equivalent practical experience. - Experience:
8+ years of professional software engineering experience, with a proven track record of delivering production systems. - Language Expertise: 5+ years of software engineering experience in a JVM language (Java or Kotlin).
- Distributed Systems: 3+ years of experience designing, building, and operating distributed systems or streaming systems in production.
- Kafka Mastery:
Hands‑on production experience with Apache Kafka – partitioning, consumer‑group rebalances, idempotent producers, transactional writes, retention, and compaction. - Cloud & Infrastructure:
Experience with at least one major public cloud (AWS, GCP, or Azure) and infrastructure‑as‑code (Terraform). - Operations:
Experience supporting 24x7 production systems on a rotating on‑call schedule, including the triage of services you did not author.
- Education:
Master’s degree in a relevant discipline. - Prior Impact:
Prior impact as a Staff‑level engineer driving cross‑team technical change (written communication plans, partner‑team office hours, deprecation enforcement). - Stream Processing:
Production experience with a stream‑processing framework – Apache Flink, Kafka Streams, or comparable. - Contract Design:
Schema‑based event‑contract design in production – Avro, Protobuf, or JSON Schema – including backward and forward compatibility. - Complex Migrations:
Direct experience migrating production workloads at scale (broker change, datastore swap, encryption‑stack change, network re‑architecture), including cutover sequencing and rollback design. - Managed Kafka:
Operational experience with a managed Kafka offering – cluster sizing, private networking, ACL…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).