Operations Analyst
Listed on 2026-06-27
-
IT/Tech
IT Support, SRE/Site Reliability, Cybersecurity, Cloud Computing: Infrastructure & Operations
Operations Analyst
We are seeking a highly reliable and detail-oriented Operations Analyst to ensure the continuous, 24×7 operation of Hippocratic AI's production systems, integrations, and customer/partner environments. This role is critical to minimizing customer and partner downtime, maintaining trust, and ensuring our AI agents and supporting systems operate smoothly at all times.
As an Operations Analyst, you will be responsible for monitoring system alerts, integrations, and operational reports; performing proactive maintenance; resolving common operational issues; and triaging advanced issues to the appropriate engineering, platform, or partner teams. You will play a central role in detecting issues early, coordinating incident response, and maintaining operational excellence across all customer and partner deployments.
You will work closely with engineering, infrastructure, security, customer support, and partner teams, and will help build the operational tooling, reporting, and automation needed to scale Hippocratic AI safely and reliably.
This role is expected to be in our Palo Alto office five days a week, unless otherwise specified.
What You'll DoIntegration Management & Development
Monitor all production systems, integrations, and automated alerts to ensure 24×7 continuous operations across customers and partners.
Serve as a first-line responder for operational alerts, diagnosing and resolving standard issues within defined SLAs.
Triage complex or advanced issues and page/engage the appropriate on-call engineers, platform teams, or partner contacts.
Coordinate incident response activities, track progress to resolution, and ensure clear internal handoffs during escalations.
Validate system recovery and perform post-incident checks to ensure full service restoration.
Proactive Maintenance & Reliability
Perform proactive system health checks, integration validations, and routine maintenance to prevent outages and degradation.
Identify trends in alerts, incidents, and performance metrics to recommend preventative actions and long-term fixes.
Help define and refine operational runbooks, escalation paths, and standard operating procedures (SOPs).
Participate in on-call rotations and support after-hours and weekend coverage as needed to maintain 24×7 availability.
Reporting, Automation & Tooling
Create and maintain operational reports and dashboards for internal teams, customers, and partners.
Build and maintain scripts and automation to monitor system health, validate integrations, and generate customer- or partner-specific reports.
Customize operational reporting for each customer/partner to meet contractual, SLA, and compliance requirements.
Continuously improve monitoring, alerting, and observability tooling to reduce noise and increase signal quality.
Cross-Functional Collaboration
Work closely with engineering, infrastructure, security, and customer support teams to resolve incidents and improve system resilience.
Support customer-facing teams by providing operational insights, incident summaries, and root-cause analysis.
Assist with onboarding new customers and partners by validating integrations, monitoring readiness, and ensuring operational coverage.
Contribute to post-incident reviews and continuous improvement initiatives to strengthen overall platform reliability.
Bachelor's degree in Computer Science, Health Informatics, Information Systems, or a related field.
Bachelor's degree in Information Systems, Computer Science, Operations, Engineering, or a related field (or equivalent practical experience).
3+ years of experience in operations, site reliability, NOC, technical support, or production monitoring roles.
Hands-on experience monitoring production systems, integrations, APIs, or data pipelines in a 24×7 environment.
Familiarity with alerting and monitoring tools (e.g., Datadog, New Relic, Cloud Watch, Prometheus, Grafana, Pager Duty, Opsgenie, or similar).
Ability to troubleshoot common system, integration, and data-flow issues using logs, metrics, and dashboards.
Experience writing scripts or automation using tools/languages such as Python, Bash, SQL, or similar.
Strong…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).