Staff Engineer,Network Observability Job New York City area,New York USA,Software Development

Staff Engineer, Network Observability

Core Weave is The Essential Cloud for AI™. Built for pioneers by pioneers, Core Weave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, Core Weave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, Core Weave became a publicly traded company (Nasdaq: CRWV) in March 2025.

Learn more at

What You'll Do:

The Network Observability team is responsible for how Core Weave observes, understands, and operates its network a Staff Engineer for Network Observability, you will define and evolve the technical direction for network observability, partnering across Network Engineering, SRE, Platform, and adjacent infrastructure teams to build resilient telemetry systems, raise engineering standards, and turn observability into a strategic advantage for the business.

Your mission: build and scale a network observability platform that provides Core Weave fast, trustworthy insight into network behavior, enables proactive risk detection, improves how engineering teams make decisions during both normal operations and incidents, and enables closed-loop automation workflows.

In This Role, You Will:

Set technical direction for network observability across multiple teams, ensuring the platform, data models, and telemetry strategy align with long-term engineering and business goals
Lead the design and evolution of scalable observability solutions using diverse collector technologies (e.g., gNMI, SNMP, Prometheus scraping, OTEL, logs, flows, etc.), persistence databases (e.g., Prometheus-like, Loki, Clickhouse), and visualization and alerting ones (e.g., Grafana, Alert manager), with a strong focus on reliability, usability, and future scale
Drive cross-team initiatives to standardize observability patterns, improve signal quality, and create a consistent approach to logs, metrics, events, flows, and related diagnostics across the network stack
Partner closely with engineering leadership and technical stakeholders to prioritize investments, navigate ambiguity, and make high-leverage technical tradeoffs that improve resilience, scalability, and operator efficiency
Act as a go-to technical expert for critical observability challenges, especially when incidents, architectural complexity, or unclear ownership require strong judgment and coordination
Mentor junior and senior engineers through technical reviews, design guidance, and hands-on problem solving, raising the bar for engineering quality and multiplying the impact of the broader team
Participate in design discussions, RFCs, and architectural decisions across the broader infrastructure organization, helping teams converge on scalable, maintainable solutions
Join a rotating on-call schedule as a senior escalation point for observability-related issues, helping teams quickly isolate failures, improve incident response, and drive durable follow-through after outages

Who You Are:

Deep expertise in building flexible network observability solutions, with diverse implementation options for collectors, distribution, processing, persistence, alerting, analytics, and visualization
Experience as a Network Engineer, SRE, Software Engineer, or Systems Engineer in large-scale environments, with a strong track record of building and operating observability or infrastructure platforms that support multiple teams
Demonstrated ability to lead through ambiguity, shape technical direction, and make sound architectural and operational tradeoffs that balance immediate needs with long-term maintainability
Strong systems thinking and practical experience designing resilient, scalable solutions that improve visibility, incident response, and engineering efficiency
Proven ability to work across teams and functions, influence without formal authority, and build trust with both technical and non-technical stakeholders
Proficient with Python, Go, and Bash, plus familiarity with configuration management and templating tools such as Ansible and…