Site Reliability Engineer; SRE),Engineering Tools Job Bangalore area,Bengaluru Karnataka India,IT/Tech

Position: Staff Site Reliability Engineer (SRE), Engineering Tools
Location: Bengaluru

About The Team
Engineering Tools owns and operates the on-prem developer platforms that every Tesla engineer depends on every day:
Git Hub Enterprise, JFrog Artifactory, Git Hub Copilot (self-hosted), Cursor (on-prem), and the Atlassian suite (Jira Service Management + Confluence). We also run the AI-augmented support layer that fronts these platforms - a Mattermost support bot backed by our internal Nabu RAG platform, observability via Open Telemetry, and a Git Ops-driven Kubernetes deployment footprint in our cluster.

If one of our systems is down, thousands of Tesla engineers stop shipping. We're hiring a Staff SRE to own the reliability, scalability, and operational maturity of that footprint.

Key Responsibilities

Platform administration:
Manage Git Hub Enterprise (Cloud and/or Server) organizations, teams, repos, branch protection rules, Actions runners, and Apps. Administer JFrog Artifactory repositories (local, remote, virtual), permissions, replication, and storage policies.
User support:
Triage and resolve tickets covering access requests, repo migrations, build/artifact failures, authentication issues, and integrations. Define and meet SLAs.
Migrations & onboarding:
Lead repo migrations into/out of Git Hub (e.g., Git Hub Migrations API, gh-migration tooling) and Artifactory repository imports/exports. Onboard new teams with templates and

standards.
Automation:
Build scripts and tooling (Bash, Python, Terraform, Git Hub Actions, JFrog CLI) to automate provisioning, permission audits, cleanup, and reporting. Eliminate repetitive support work.
Reliability & monitoring:
Monitor platform health, storage usage, runner capacity, and license consumption. Coordinate upgrades, patches, and incident response with the vendor.
Security & compliance:
Enforce SSO/SAML, SCIM provisioning, secret scanning, signed commits, audit logging, and least-privilege access. Support SOC 2 / ISO audits.
Integrations:
Maintain integrations with CI/CD (Jenkins, Git Hub Actions, Git Lab CI), SAST/SCA scanners, Jira, Slack, and internal developer portals.
Documentation & enablement:
Write runbooks, FAQs, and self-service guides. Host office hours and training sessions for

developers.

Required Qualifications

3+ years administering Git Hub Enterprise (Cloud or Server) at scale (500+ users or 1000+ repos).
2+ years administering JFrog Artifactory (or comparable: Nexus, Cloudsmith, Harbor).
Strong scripting in Bash and Python; comfortable with REST APIs and curl/jq.
Working knowledge of Git internals (refs, packfiles, LFS, submodules) and ability to debug repo corruption, large-file issues, and merge problems.
Hands-on experience with at least one CI/CD system (Git Hub Actions, Jenkins, Git Lab CI, Circle

CI).
Familiarity with SSO/SAML, SCIM, OIDC, and personal/fine-grained access tokens.
Excellent written communication - you can turn a confusing incident into a clear postmortem and a vague ticket into a fixable problem.

Preferred Qualifications

Experience with Git Hub Migrations API, gh-migration-tool, or gei (Git Hub Enterprise Importer).
Experience operating Artifactory in HA mode, with S3/blob storage, and Xray for vulnerability scanning.
Infrastructure-as-Code:
Terraform providers for Git Hub and Artifactory.
Container/package format expertise:
Docker, npm, Maven, PyPI, Helm, Conan.
Familiarity with secret scanning tools (Git Hub Advanced Security, Git Guardian, Truffle Hog) and dependency management

(Dependabot, Renovate).
Prior on-call or production support experience.
Exposure to GHAS, Copilot for Business, or Copilot Enterprise rollouts.

Bonus

Experience operating self-hosted LLM inference (Copilot Enterprise, on-prem Cursor backend, vLLM, or similar), RAG pipelines, or vector databases.

Soft Skills
Excellent written communication - you can write a post-mortem that engineering leadership reads to the end, and a runbook that a junior on-call can execute at 3 AM. Strong technical influence without authority; you raise the reliability bar across teams by example and through reviews, not by mandate. Calm under pressure during sev-1 incidents affecting thousands of engineers.

Education
Bachelor's degree in Computer Science, Engineering, or related field or equivalent professional experience.

Why This Role Is Different

Customer = every Tesla engineer. Your platforms unblock Vehicle Software, Autopilot, Energy, and Manufacturing teams. The impact of every reliability improvement compounds across the company.
On-prem by design. We don't outsource our critical paths to SaaS. You'll own the full stack - hardware, network, OS, platform, application, observability - and you'll have the authority to change it.
AI-augmented support. We're not just operating platforms; we're building the AI tooling (Nabu RAG + Mattermost support bot + Copilot/Cursor integrations) that lets a small SRE team serve a very

large engineering org. You'll help shape that.
High autonomy, high ownership. Engineering Tools is small and senior-heavy. As a Staff SRE you'll set technical direction for…

Site Reliability Engineer; SRE), Engineering Tools