Site Reliability Engineer Job Bangalore area,Bengaluru Karnataka India,IT/Tech

Location: Bengaluru

Site Reliability Engineer

Experience:

0–2 Years

Location:

Bangalore
Job Type: Permanent, Full time, WFO

Role Overview As a Site Reliability Engineer (SRE) – Observability, you will support the design, implementation, and maintenance of monitoring and observability platforms for customer facing and AI driven applications.
This is an entry level to early career role where you will work under senior SREs to build dashboards, configure monitoring tools, and help improve service reliability and visibility across systems.
You will collaborate with engineering and operations teams to understand application behavior and contribute to building clear, actionable dashboards and monitoring solutions.

Key Responsibilities Assist in configuring and maintaining observability tools such as Grafana, Prometheus, Loki, and Jaeger
Support the implementation of Golden Signals (Latency, Traffic, Errors, Saturation)
Build and maintain basic Grafana dashboards for engineering and operations teams
Help collect and validate metrics, logs, and traces from applications
Assist in troubleshooting production issues using logs and monitoring tools
Participate in monitoring performance indicators such as latency, throughput, and error rates
Support implementation of alerting rules and basic SLO monitoring
Document dashboard structures, monitoring configurations, and operational runbooks
Work with senior engineers to improve dashboard usability and visualization clarity
Learn and apply SRE best practices in reliability and availability

Required Qualifications 0–2 years of experience in Dev Ops, SRE, Monitoring, or Backend Engineering roles
Basic understanding of:
Linux systems
Cloud platforms (AWS / Azure / GCP)
Microservices architecture
Familiarity with monitoring tools such as Grafana or Prometheus
Basic knowledge of:
Metrics, logs, and distributed tracing concepts
HTTP status codes and API monitoring
Understanding of reliability concepts such as uptime, availability, and incident response
Good problem solving and debugging skills
Strong willingness to learn observability engineering and production systems
Technical Skills (Good to Have) Hands on exposure to:
Prometheus (metrics collection)
Loki (log aggregation)
Jaeger (distributed tracing)
Basic understanding of containers (Docker) and Kubernetes
Familiarity with CI/CD pipelines
Knowledge of alerting systems and monitoring thresholds
Exposure to AI / ML or high traffic applications


Increase/decrease your Search Radius (miles)



Job Posting Language