Intermediate Site Reliability Engineer, Database Operations
Listed on 2026-01-01
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Git Lab is an open-core software company that develops the most comprehensive AI-powered Dev Sec Ops Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our platform unites teams and organizations, breaking down barriers and redefining what s possible in software development.
Thanks to products like Duo Enterprise and Duo Agent Platform, customers get AI benefits at every stage of the SDLC.
The same principles built into our products are reflected in how our team works: we embrace AI as a core productivity multiplier, with all team members expected to incorporate AI into their daily workflows to drive efficiency, innovation, and impact. Git Lab is where careers accelerate, innovation flourishes, and every voice is valued. Our high-performance culture is driven by our values and continuous knowledge exchange, enabling our team members to reach their full potential while collaborating with industry leaders to solve complex problems.
Co-create the future with us as we build technology that transforms how the world develops software.
Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other Git Lab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the Git Lab codebase. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.
The Database Operations team’s mission is to build, run, own and evolve the entire lifecycle of the Postgre
SQL database engine for The team is focused on owning the reliability, scalability, evolution, performance & security of the database engine and its supporting services. The team should be seeking to build their services on top of Reliability::
Foundations services and cloud vendor managed products, where appropriate, to reduce complexity, improve efficiency and deliver new capabilities quicker.
is a unique site and it brings unique challenges–it’s the biggest Git Lab instance in existence. In fact, it’s one of the largest single-tenancy open-source SaaS sites on the internet. The experience of our team feeds back into other engineering groups within the company, as well as to Git Lab customers running self-managed installations
Responsibilities- Automating every operational task is a core requirement for this role. For example, package updates, configuration changes across all environments, creating tools for automatic provisioning of user facing services, etc.
- Responding to platform emergencies, alerts, and escalations from Customer Support.
- Ensure systems exist to manage software life-cycles (e.g. Operating Systems) with a minimum of manual effort.
- Develop a fully automated multi-environment observability stack based on the existing SaaS system, and extend it to predict capacity needs based on the usage patterns.
- Plan for new service roll-outs, expansion and capacity management of existing services, and work with users to optimize their resource consumption.
- Work on database reliability and performance aspects for from within the SRE team as well as work on shipping solutions with the product.
- Analyze solutions and implement best practices for our Postgre
SQL database clusters and its components. - Work on observability of relevant database metrics and make sure we reach our database objectives.
- Work with peer SREs to roll out changes to our production environment and help mitigate database-related production incidents.
- OnCall support on rotation with the team.
- Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations).
- Work on automation of database infrastructure and help engineering succeed by providing self-service tools.
- Use the Git Lab product to run as a first…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).