×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

Principal Database Reliability Engineer

Job in Austin, Travis County, Texas, 78716, USA
Listing for: BEDI Partnerships
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Data Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Join Udemy. Help define the future of learning.

Udemy is an AI-powered skills acceleration platform built to help people and teams grow. It’s personalized, practical, and focused on real-world impact.

Our mission is simple: to transform lives through learning. Your work helps people around the world build skills they can use, whether they’re picking up something new or leveling up to stay ahead.

Over 80 million learners and 17,000 businesses already learn with Udemy. If you’re excited by change, energized by learning, and ready to have a real impact, you’ll feel right at home.

As part of Udemy's Platform team, the Datastore Infrastructure (DSI) team is responsible for overseeing all aspects of Databases (MySQL, Aurora, Dynamo

DB), Message Queues (Rabbit

MQ), Streaming (Kafka), and Caching (Redis, Memcache) in our infrastructure. This includes ensuring uptime, security and compliance, observability, performance, improving developers’ productivity and developing future growth strategies. The team is split between EU and US regions. You will play a vital role in overseeing day-to-day activities and engineering strategies of DSI, ensuring that millions of students worldwide achieve greater learning and career outcomes on Udemy.

We value teamwork, a good sense of humor, strong ownership, technological curiosity, and a desire to learn.

What you'll be doing:
  • Lead improvement projects for our data stores and platform teams to align with the company’s long-term objectives.
  • Maintain Infrastructure Uptime, monitor performance, and ensure infrastructure continues scaling as we grow.
  • Develop immutable infrastructure patterns and automate infrastructure provisioning via code (Terraform, Python, Ansible etc).
  • Ensure adherence to PCI, ISO
    27001 compliance and SOC 2 security requirements, modifying CI/CD processes when necessary, and upholding policies and standards.
  • Advocate for and implement positive changes in tools and processes through healthy discussions.
  • Participate in the on-call rotation, demonstrating a systematic approach to incident management.
  • Participate in day-to-day activities, support requests, and project-related tasks for the team.
  • Contribute to documentation, maintain ticketing queues, provide project support, troubleshoot, and offer after-hours assistance as required.
  • Provide coaching and mentorship to new hires, fostering their technical growth and integration into the team, and maintain close communication with team members throughout their tenure.
What you’ll have:

We do not expect you to have all the below, but the more mix/max skills you have the easier you will onboard.

  • 8-10 years of professional experience working in a Cloud Engineering team (also SRE/DBRE team) with infrastructure responsibilities in managing large production workloads.
  • Proficiency with managing MySQL at scale (horizontal scaling, sharding, InnoDB optimizations, query optimization, HA/DR, monitoring, backup strategy, security, automations).
  • Proficiency with tools like Terraform, Ansible, Git and how to work with Infrastructure as Code, and automated provisioning.
  • Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring high availability and fault tolerance. Experience with MSK is also good.
  • Experience with message queues (MQ/SQS) and caching (Redis, Memcache) or similar products.
  • Experience in Python.
  • Knowledge of configuration management tools, monitoring systems (Datadog or similar) for database infrastructure, and scaling strategies for handling increased data volumes.
  • Strong troubleshooting skills to diagnose complex database issues.
  • Hands‑on experience with AWS cloud infrastructure and a grasp of security best practices.
  • Adaptability and comfort working in a fast‑paced, hands‑on environment.
Nice to have:
  • Experience with any additional programming languages (Golang, Kotlin, Java).
  • Experience in implementing CDC pipelines for reliable data replication and synchronization.
  • Experience with Vitess Operator running MySQL on Kubernetes.
  • Experience with writing Kubernetes Helm charts.
  • Experience with tools like ArgoCD/Argo Workflows or similar alternatives.
  • Knowledge of security standards, vulnerability…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary