Senior DevOps Engineer Job New York New York USA,IT/Tech

Location: New York

About Us
:
Founded 20 years ago and headquartered in Chicago, the DV Group of financial services firms has grown to more than 600 people operating throughout North America, Europe and Asia. Since spinning out of a large brokerage firm in 2016, DV Trading has rapidly scaled as an independent proprietary trading firm utilizing its own capital, trading strategies, and risk management methodologies to provide liquidity to worldwide financial markets and hedging opportunities to commodity producers and users.

Now, DV group affiliates include two broker dealers, a cryptocurrency market making firm, and a bourgeoning investment adviser.

Overview

We're looking for a senior level Dev Ops Engineer to join a small, high-impact team that builds and operates the infrastructure powering DV's trading systems. You'll work on Kubernetes at scale, a firm-wide observability platform, CI/CD infrastructure, and workflow orchestration. This team built the entire platform from scratch in 2025 and now owns it end-to-end. You'll be joining at a pivotal moment: the team is growing, the platform is scaling to meet increasing demand across the firm, and there's no shortage of hard problems to solve.

You'll have real ownership from day one — not tickets in a queue.

Job Responsibilities

Kubernetes platform operations and evolution:
Cluster lifecycle management via Cluster API, fleet-wide upgrades, bare metal provisioning, CNI networking, storage, autoscaling, and disaster recovery planning.
Observability:
Operate and scale our on-prem observability stack — Mimir (metrics), Loki (logs), Tempo (traces), Grafana (dashboards), Open Telemetry Collector fleet, and Alert manager. Drive adoption by onboarding teams, building dashboards, tuning alerts, and scaling ingestion for growing workloads.
CI/CD and Git Ops:
Git Lab pipelines, ArgoCD for Kubernetes deployments, Artifactory for artifact management. Build reusable CI components, improve secrets integration, and support trading desks onboarding to the platform.
Infrastructure automation and tooling:
Build and maintain automation for provisioning, configuration management, and self-service tooling that enables teams across the firm to move faster without relying on Dev Ops for every change.
Platform adoption and support:
Work directly with trading desks and development teams to onboard them to the platforms we build. Write runbooks, contribute to documentation, and help build a sustainable support model as the platform scales.

Requirements

5–8 years of experience in Dev Ops, SRE, or Platform Engineering roles.
Kubernetes:
Deep hands‑on experience operating K8s in production. Cluster lifecycle, troubleshooting, networking, storage, RBAC. Experience with Cluster API, bare metal provisioning, or multi‑cluster management is a strong plus.
Observability:
Production experience with Prometheus, Grafana, and alerting. Familiarity with Mimir, Loki, Tempo, or Thanos for scaled metrics, logs, and traces. Understanding of Open Telemetry (collectors, exporters, instrumentation).
Git Ops and CI/CD:
Experience with ArgoCD, Flux, or similar Git Ops tools. Building and maintaining CI/CD pipelines (Git Lab CI, Git Hub Actions, or equivalent). Artifact management (Artifactory, Nexus, or similar).
Infrastructure as Code:
Terraform and/or Ansible for provisioning and configuration management.
Linux systems:
Strong fundamentals — systemd, networking, storage, performance troubleshooting.
Programming:
Go or Python for automation, tooling, and scripting. Comfortable reading and writing YAML and Helm charts.
Communication:
Ability to work directly with trading desks and development teams who depend on the platforms you build. You'll be explaining K8s concepts to people who aren't K8s experts.

Preferred Skills

Experience working at a trading firm — understanding the urgency and reliability requirements of systems that support P&L‑impacting workloads.
Experience with Kubeflow, Airflow, or ML pipeline orchestration.
Experience with advanced Kubernetes networking (CNI plugins, network policies, service mesh).
Experience with Kafka or event streaming platforms.
Experience operating on‑prem infrastructure (not just cloud) —…