Site Reliability Engineer Job Gurugram area,Uttar Pradesh India,IT/Tech

POSITION SUMMARY

In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and expertise in Dev Ops principles, you will work towards enhancing the reliability of our infrastructure and minimizing downtime, thus enabling the organization to deliver high-quality software with maximum efficiency

EXPERIENCE AND REQUIRED SKILL SETS

Ensure 24
* 7 uptime and stability of production systems
Investigate and troubleshoot production issues
Collaborate with developers to optimize system performance
Participate in on-call rotation to provide 24/7 support for critical systems
Work on automation and enhancements to reduce manual processes / intervention.
Relevant 5+ years of experience in SRE / Production/Product Support role, with a track record of implementing SRE practices
Basic understanding of cloud solutions provided by providers such as AWS or Azure.
Basic-Intermediate knowledge of Scripting in either of Bash/Python/Power Shell.
Good presentation, communication and interpersonal skills with the ability to collaborate effectively with cross-functional teams and stakeholders across different countries and cultures.
Good problem solving and troubleshooting skills
Continuous learning mindset and willingness to adapt to new technologies and industry trends.
Good Understanding of Operating System Commands (Linux),SQL (Ability to write, analyze queries and deduce / build important information per requirement)
In-depth knowledge of Trading Life Cycle:

The candidate should possess comprehensive understanding of trading life cycle, including order management, trade execution, settlement and post-trade processes. Familiarity with various financial products like Equities, Derivatives, Currencies, Commodities, FX is a plus.
Incident and Problem Management Expertise:
The candidate must demonstrate strong problem-solving skills and the ability to manage incidents frequently and efficiently within a fast paced trading environment. This includes identifying, analyzing and resolving issues related to trading systems and processes as well as collaborating with cross-functional teams to implement long-term solutions and improve operational efficiency.

Good Understanding of Tools
Orchestration – Autosys / Airflow or Cron
Monitoring &Logging – Pager Duty, Prometheus & Grafana or Datadog, Splunk
Project Management / ITSM – Service Now (Basic ability to navigate / create change tickets / incidents) , Jira (Basic ability to create Jira Tickets , ability to filter your work)

EDUCATION

Bachelor’s degree or master’s in computer science, Engineering, Software Engineering or a relevant field