×
Register Here to Apply for Jobs or Post Jobs. X

Head of Delivery

Job in Town of Italy, Penn Yan, Yates County, New York, 14527, USA
Listing for: Albatross AI
Full Time position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below
Location: Town of Italy

Overview

Location
:
Remote, right to work and travel in Europe.

Albatross
:
At Albatross, we're building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests, in real-time. We have raised significant funding and our platform already operates at scale, with billions of events being processed and hundreds of millions of predictions served.

The Role

We're looking for a Site Reliability Engineer to own the reliability and observability of our platform. This is a hands-on leadership role where you'll design, build, and maintain our observability stack, lead incident response, oversee releases, and establish the processes and standards that allow the team to ship quickly and confidently. More specifically you will:

  • Observability & Monitoring:
    Own and evolve our observability stack (Prometheus, Grafana, Loki, Jaeger), including dashboards, alerts, and SLOs. Instrument services for meaningful metrics and tracing, reducing noise and improving signal
  • Reliability & Incident Response:
    Lead incident response and establish blameless postmortems, runbooks, and automated remediation. Define, track, and improve SLIs/SLOs to proactively reduce reliability risk
  • Release Management:
    Own the release process end-to-end, improving deployment speed, safety, and recovery. Implement progressive rollouts, feature flags, and rollback strategies
  • Platform & Tooling:
    Embed observability into the development lifecycle in close collaboration with engineering. Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value
Requirements
  • 5-7+ years in SRE, platform engineering, Dev Ops, or similar roles
  • Strong production experience with Kubernetes and modern observability stacks (Prometheus, Grafana, Loki, Jaeger/Open Telemetry)
  • Proven track record leading incident response and building monitoring systems teams actually use
  • Deep distributed systems knowledge and production debugging experience
  • Pragmatic approach to tooling and alerting that teams trust
  • Clear communicator across engineering, product, and leadership
  • STEM degree (Computer Science, Engineering, Mathematics, or similar)
  • Plus: contributions to open-source observability projects and background in high-scale or high-availability environments
Benefits
  • Remote-first, async-friendly culture
  • Ownership and autonomy, you'll shape how we do reliability
  • A team that cares about building things right
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary