Production Operations/Reliability Engineer
Listed on 2026-05-19
-
IT/Tech
IT Support, Systems Engineer
Production Operations / Reliability Engineer
Redmond, WA
Who is Blueprint?We are a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States. Unified by a shared passion for solving complicated problems, our people are our greatest asset. We use technology as a tool to bridge the gap between strategy and execution, powered by the knowledge, skills, and the expertise of our teams, who all have unique perspectives and years of experience across multiple industries.
We’re bold, smart, agile, and fun.
Blueprint helps organizations unlock value from existing assets by leveraging cutting‑edge technology to create additional revenue streams and new lines of business. We connect strategy, business solutions, products, and services to transform and grow companies.
Why Blueprint?At Blueprint, we believe in the power of possibility and are passionate about bringing it to life. Whether you join our bustling product division, our multifaceted services team or you want to grow your career in human resources, your ability to make an impact is amplified when you join one of our teams. You’ll focus on solving unique business problems while gaining hands‑on experience with the world’s best technology.
We believe in unique perspectives and build teams of people with diverse skillsets and backgrounds. At Blueprint, you’ll have the opportunity to work with multiple clients and teams, such as data science and product development, all while learning, growing, and developing new solutions. We guarantee you won’t find a better place to work and thrive than at Blueprint.
In this role, you will support the reliability, stability, and live operations of a new device and software platform during internal testing and self‑host programs. You will focus on monitoring system health through telemetry, investigating live issues, supporting software releases, and validating prototype devices in production‑like environments.
This is a hands‑on, engineering‑oriented operations role where you will work closely with software engineers, QA, infrastructure, and product partners to ensure operational readiness and service stability. You will independently manage day‑to‑day monitoring, triage incidents, support release validation, and provide clear, actionable insights to improve system reliability and product readiness.
Responsibilities Live Monitoring & Telemetry- Monitor telemetry, dashboards, logs, alerts, and metrics to assess the health of services, applications, and prototype devices.
- Identify anomalies, failures, and performance degradation across software and device environments.
- Analyze real‑time and historical data to diagnose issues and surface reliability risks.
- Triage operational issues and communicate findings clearly to engineering and product teams.
- Recommend improvements to monitoring coverage, alert quality, and operational visibility.
- Support software releases by validating deployments and monitoring post‑release system stability.
- Track service and device health during rollouts, updates, and release validation periods.
- Investigate and assist in resolving live issues impacting internal users or device readiness.
- Partner with engineering teams on mitigations, fixes, rollbacks, and follow‑up validation.
- Document release observations, risks, and stability assessments.
- Support incident response by gathering logs, diagnostics, and impact data.
- Summarize incidents, suspected root causes, and mitigation progress.
- Participate in post‑incident reviews and document lessons learned.
- Maintain records of incidents, recurring issues, and known reliability risks.
- Identify opportunities to reduce operational toil through documentation or process improvements.
- Perform in‑person troubleshooting for prototype devices and self‑hosted systems when needed.
- Assist with device configuration, deployment, validation, and health checks.
- Run smoke tests and readiness checks to confirm system and device stability.
- Document hardware configurations, operational procedures, and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).