Site Reliability Engineer
Listed on 2025-11-20
-
IT/Tech
Systems Engineer
Flo Sports is a world-class sports media company strategically positioned to be the essential destination for passionate sports fans, delighting them with live event coverage, breaking news, highlights, stats, rankings, and team and player profiles. We are growing Our Sports every day by continuing to invest in our ever-expanding ecosystem, which consists of over a dozen sport verticals and hundreds of streaming partners.
Flo Sports is creating the home for college conferences/leagues and sports like grappling, hockey, track & field, racing, cheer, wrestling, and more, and we are looking for innovative and passionate people like you to help us!
At Flo Sports, SRE is the team that acts as a force multiplier for our engineering organization. Our mission is to be the "wind in the sails" for our developers, enabling them to ship features faster, safer, and with more confidence. We are a "code-first" group that believes in automating away toil and solving problems with software. We don't click buttons in a console;
we write code, build tools, and manage our infrastructure through Git Ops.
As a Staff SRE
, you will be a technical leader on a highly skilled and senior team. You will be a key driver of our architecture, reliability, and developer enablement strategy. This role requires a balance of high-impact individual contribution, technical leadership, and close collaboration with other Staff and Senior engineers to set the technical direction for the entire organization.
Our culture is built on principles of shared stability responsibility and pragmatism. We are guided by a philosophy of simplicity (if you've read grugbrain.dev, you'll fit right in). We believe it's more fun to be competent, and we're looking for another expert to join our team.
RESPONSIBILITIES:Lead the technical architecture and execution of our landmark migration from a legacy GCP environment to a modern, scalable infrastructure on AWS EKS
.Architect, design, and drive our core infrastructure, defining the patterns for Terraform and Git Ops that the rest of the organization will follow.
Champion and drive our SLO-driven culture
, setting the strategy for how we define, measure, and implement SLOs for critical user journeys, guided by the four Golden Signals (Latency, Traffic, Errors, and Saturation).Lead the design and development of critical tooling and automation in Node.js and Go to solve entire classes of problems for our developers.
Lead the architectural evolution of our in-house,
K6-based load testing platform
, ensuring it can scale to meet future product demands.Act as a primary subject matter expert for our Istio service mesh, driving its architecture, adoption, and optimization.
Spearhead and own high-priority initiatives, including the development of agentic workflows and intelligent automation for SRE domains like proactive scaling and automated remediation.
Act as a technical leader by participating in our blameless on-call rotation, mentoring other engineers through complex incidents and ensuring all post-mortems lead to systemic, long-term improvements.
KNOWLEDGE, SKILLS AND ABILITIES:Extensive
Experience:
8-10+ years in SRE, Dev Ops, or Software Engineering, with a proven track record of operating at a Staff level.Proven Technical Leadership: You have a history of mentoring other senior engineers, influencing technical direction across multiple teams, and leading large-scale projects to completion.
Expert Coder: You are a polyglot with deep expertise in languages like Node.js or Go and a history of building and maintaining critical automation and services.
Kubernetes Architect: You have an expert-level, architectural understanding of Kubernetes (EKS preferred), including networking, custom controllers, and control plane optimization.
Infrastructure as Code Expert: You are a Terraform expert who has designed and implemented large-scale, reusable, and secure IaC frameworks, not just consumed them.
Observability Architect: You have designed and implemented observability strategies from the ground up, leveraging platforms like Datadog to create actionable SLOs and provide deep system insight.
CI/CD Architect: You have…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).