Site Reliability Engineer Job Vadodara area,Gujarat India,IT/Tech

This Position
As a Site Reliability Engineer, you will be a foundational member of our production team as we evolve into a modern SRE organization. Your mission is to improve the reliability, scalability, and performance of Viking Cloud's production services by applying software engineering principles to operational challenges. While you'll handle day-to-day operational tasks, your primary focus will be on engineering long-term solutions, automating manual work, and building a more resilient and observable platform.

This is a hands-on role where you will directly contribute to transforming our operations.

Responsibilitie

s
Own Production Stabilit y:
Serve as a primary point of contact for troubleshooting and resolving issues in our production environment
s.Incident Management & Analysi s:
Manage production incidents, communicate status updates, and participate in post-incident reviews to help identify and address root cause
s.Improve Operational Processe s:
Handle operational tasks like user account management, deployments, and generating reports, while constantly looking for opportunities to streamline and automate these workflow
s.Contribute to Monitorin g:
Use and improve our monitoring tools (like Grafana, Prometheus, Loki, etc) to better detect issues, reduce false alarms, and gain deeper insights into system performanc
e.Collaborate on Release s:
Work closely with our Dev Ops, QA, and Development teams to ensure smooth and safe deployment of new product release
s.Develop Your Skill s:
Dedicate time to learning new technologies and SRE principles, applying them to your daily work. This includes scripting, learning cloud services, and understanding infrastructure-as-cod
e.Support System Resilienc e:
Assist in the planning and execution of disaster recovery tests to ensure our business continuity plans are effectiv

e.
Qualificati

ons
Core Skills (Requir
ed):

A minimum of 2-3 years of experience in a Technology Operations, Application Support, or Production Support r
ole.

Solid understanding of Linux fundamentals and command-line to
ols.

Good knowledge of SQL for querying and troubleshooting databa
ses.

Proven experience in application troubleshooting within a production environm
ent.

Strong analytical and problem-solving skills with high attention to det
ail.

Excellent communication skills and experience using tools like Jira and Conflue
nce.

A genuine curiosity and desire to learn about automation, cloud technologies, and SRE practi
ces.

Desirable Skills (Nice to Ha
ve):

Some experience with a major cloud platform (AWS or GCP) is highly desira
ble.

Basic scripting experience in a language like Bash or Python for automating simple ta
sks.

Familiarity with monitoring tools such as Grafana, Loki and Prometh
eus.

An understanding of ITIL principles, particularly around Incident and Change Managem
ent.

Previous experience in software Quality Assurance (QA) is a p

lus.