Senior Site Reliability Engineer; Middleware
Listed on 2025-11-14
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Senior Site Reliability Engineer (Middleware)
Redefine the future of customer experiences. One conversation at a time.
At Nextiva, we’re reimagining how businesses connect, bringing together customer experience and team collaboration on a single, conversation centric platform. Powered by AI, driven by human innovation.
Our culture is forward thinking, customer obsessed and built on the belief that meaningful connections drive better business outcomes. Whether it’s through our signature Amazing Service®, the technology we create, or the experiences we cultivate, connection is at the core of who we are.
If you’re ready to collaborate with incredible people, make an impact, and help businesses everywhere deliver truly amazing experiences, this is where you belong.
We are looking for a Senior Site Reliability Engineer (SRE) to join our Middleware Engineering team. In this highly dynamic environment, you'll be responsible for supporting and scaling our Kafka and Elasticsearch infrastructure - core systems that power our SaaS platform.
We're looking for someone who thrives on automation, embraces AI-driven observability, and is eager to learn and adopt new technologies quickly. You'll not only respond to production issues, but proactively build intelligent, resilient systems to prevent them.
If you enjoy owning systems end to end, writing clean automation, and working in a fast-moving team that values innovation, this role is for you.
Key Responsibilities
- Triage, troubleshoot, and resolve complex production issues involving Kafka and Elasticsearch
- Design and build automated monitoring, alerting, and logging systems - leveraging AI/ML techniques where possible
- Write tools and infrastructure software to support self‑healing, auto‑scaling, and incident prevention
Automate system administration tasks - from patching and upgrades to config and deployment workflows - Use and manage Git Hub extensively for infrastructure-as‑code, release management, and collaboration
- Partner with development, QA, and performance teams to ensure middleware systems are production‑ready
- Participate in the on‑call rotation and continuously improve incident response and resolution playbooks
- Mentor junior engineers and contribute to a culture of automation, learning, and accountability
- Lead large‑scale reliability and observability projects in collaboration with global teams
Qualifications
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
- Fluent English communication skills (spoken and written)
Core Competencies
- 6+ years of experience in software development, automation, or infrastructure engineering
- Deep experience with Mongo
DB, Kafka and/or Elasticsearch in production environments - Strong Linux systems expertise and 6+ years managing Linux‑based environments
- Hands‑on experience with cloud platforms - GCP and/or AWS required
- Automation‑first mindset - deep experience with Ansible, Terraform, Jenkins
- Expert‑level understanding of Git and Git Hub workflows for CI/CD and infrastructure‑as‑code
- Proficient with container tools (Docker) and orchestrators (Kubernetes)
- Strong understanding of SRE principles - SLAs/SLOs, alerting, observability, and incident management
- Experience with SQL, caching systems (e.g., Redis), and troubleshooting distributed systems
- Quick learner with a strong curiosity for new tools, frameworks, and AI/ML use cases in operations
Nice to Have
- Observability Tools:
Datadog, Splunk, Kibana, Opsgenie - Experience with AI/ML‑based anomaly detection, AIOps platforms, and LLM integrations for infrastructure
- Azure cloud experience (nice to have)
Why Join Us
- Shape the future of middleware reliability using AI and intelligent automation
- Work with a global team that values initiative, innovation, and ownership
- Grow in a fast‑paced environment where learning and experimentation are part of the culture
- Drive technical leadership, mentor others, and make a meaningful platform‑wide impact
How to Apply
If you're passionate about automation, AIOps, MLOps, and scalable middleware infrastructure, and you're ready to move fast, learn constantly, and own critical systems - we'd love to connect with you.
Nextiva DNA (Core Competencies)
Nex…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).