×
Register Here to Apply for Jobs or Post Jobs. X

Observability & Monitoring Lead

Job in 201301, Noida, Uttar Pradesh, India
Listing for: Luxoft
Full Time position
Listed on 2026-06-01
Job specializations:
  • IT/Tech
    IT Support, Cybersecurity
Job Description & How to Apply Below
Project

Description:

Support clients in the operation, maintenance, and optimization of Oracle Cerner EHR environments. This role is designed for early-career professionals who are eager to grow their technical skills in healthcare IT while working under the mentorship of experienced consultants and technical leaders. You will gain hands-on exposure to Cerner infrastructure, system workflows, and healthcare technology best practices while contributing to meaningful client outcomes.
Responsibilities:
Trend Analysis & Problem Identification
- Identify recurring incident patterns, anomalies, and signs of alert fatigue that may indicate deeper systemic issues.
- Collaborate with L2/L3 teams to review telemetry data and recommend improvements to alert thresholds, rules, and policies.
- Provide insights that support proactive issue prevention, noise reduction, and overall monitoring refinement.

2. Platform Management & Optimization
- Develop, update, and maintain dashboards that reflect realtime system health, performance metrics, and service behavior.
- Support the ongoing adoption and optimization of Dynatrace, enhancing dashboarding and visualization capabilities for cloud and onprem observability.
- Assist in routine platform checks, ensuring monitoring tools remain accurate, stable, and aligned with business and operational requirements.

3. Leadership & Collaboration
- Responsible for organizing the work for the team, including planning, task breakdown, and ensuring clarity of priorities.
- Provide structured, timely updates to leadership on progress, risks, blockers, team capacity, and delivery timelines.
- Work closely with application teams, SRE groups, and infrastructure operations during incident triage, investigations, and routine monitoring reviews.
- Ensure clear, timely, and effective communication with stakeholders during service-impacting events, providing status updates and context as needed.
- Ensures adherence to engineering best practices, drives operational excellence, and maintains accountability for team delivery outcomes

4. Operational Excellence
- Support platform stability and availability through adherence to lifecycle maintenance, patching schedules, and vulnerability management processes.
- Contribute to the improvement of monitoring workflows, alert routing logic, runbook effectiveness, and incident management practices.

5. Innovation & AI Enablement
- Assist in exploring and adopting AI-driven capabilities that improve observability, automate rootcause identification, and reduce manual effort.
- Contribute to internal knowledge sharing by documenting best practices, playbooks, AI reference materials, and usage guidelines (e.g., Copilot tips).

6. Collaboration & Leadership Support
- Partner with cross-functional teams to align monitoring practices with evolving business needs and operational priorities.
- Drive end-to-end delivery of monitoring initiatives—requirements gathering, planning, execution oversight, and delivery validation.
- Coordinate crossteam dependencies, ensure timelines are met, and proactively remove blockers for the team.
- Provide subject matter support for ITSM processes including incident, problem, and change management discussions.
Mandatory

Skills:

New Relic
Mandatory Skills

Description:

- 6+ years in Site Reliability Engineering or Observability/Monitoring engineering roles.
- 5+ years hands-on with monitoring/observability tools:
New Relic, Solar Winds ,WUG
- 4+ years of scripting experience (JavaScript, Java, Power Shell, or others)
- 2+ years with Azure (architecture fundamentals, observability in cloud-native and liftandshift contexts).
- 4+ year scripting with Python and Bash or Power Shell for automation.
- Experience troubleshooting complex distributed applications, leading/participating in war rooms, and performing codelevel impact analysis (read logs/stack traces, correlate with deploys and infra changes).
- Solid understanding of observability best practices (metrics, logs, traces), ITSM processes, and alert hygiene.
- Have the mindset of 'automate any task'
- Maintain associated documentation as it applies to our audit and certification requirements
- Ensure platform stability, availability, and compliance through proactive vulnerability management and lifecycle maintenance
- Drive process improvements for monitoring workflows and incident management
- Participate in troubleshooting, capacity planning, and performance analysis activities
- Research new monitoring requirements and in many cases write code for that
- Solid expertise in setting up monitoring policies/rules/templates; and writing scripts to accomplish monitoring requirements
- Excellent problem solving, communication, and crossteam collaboration skills.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary