More jobs:
Software Development Engineer
Job in
Redmond, King County, Washington, 98053, USA
Listed on 2026-06-02
Listing for:
Talent Software Services
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
IT Support, Systems Engineer
Job Description & How to Apply Below
Typical Day in the Role
• Purpose of the Team:
The purpose of this team is to support a confidential companies device project by ensuring software quality and stability during the internal self-host program.
• Key projects:
This role will contribute to monitoring device health through telemetry dashboards, investigating issues, assigning bugs, and gathering logs (including hands-on device support) to ensure stability in production. Candidate Requirements
• Best vs. Average:
The ideal resume would contain experience with Android (mobile OS), 5-7 years minimum is required for the role but more experience be a bonus, should be able to work independently # Senior Operations / Reliability Engineer ## Summary We are seeking a
** Senior Operations / Reliability Engineer
** to support live operations, service reliability, release stability, and prototype device monitoring for a new hardware and software product. This role will focus on monitoring telemetry, diagnosing live issues, validating software releases, supporting incident response, and helping improve operational readiness across services, applications, and prototype device environments. This is an engineering-oriented operations role. The ideal candidate will be comfortable working with logs, dashboards, alerts, deployment signals, and live system behavior, while partnering closely with software engineers, QA, infrastructure teams, PMs, and product leadership.
The role will be strongly supported by experienced engineers on the team, who will provide technical guidance on service architecture, prototype device workflows, telemetry interpretation, release processes, and complex debugging. The engineer will collaborate closely with these senior team members while taking ownership of day-to-day monitoring, release validation, live issue triage, documentation, and operational reporting. # Scope of Work & Responsibilities ## Live Monitoring & Telemetry
* Monitor telemetry from services, applications, and prototype devices to assess operational health.
* Observe dashboards, alerts, logs, and metrics to identify anomalies, failures, performance degradation, or emerging reliability risks.
* Analyze real-time metrics and logs to support troubleshooting across cloud, on-premises, and prototype device environments.
* Triage operational issues and communicate findings clearly to engineering, QA, PM, and product teams.
* Provide actionable insights based on telemetry trends, system behavior, and recurring failure patterns.
* Help improve monitoring coverage, alert quality, dashboard usefulness, and operational visibility. ## Release & Service Operations
* Support software releases by validating deployments, monitoring live systems, and assessing post-deployment stability.
* Track service health during rollouts, ring deployments, updates, and release validation windows.
* Identify, debug, and help resolve live issues affecting services, devices, internal users, or product readiness.
* Partner with engineering teams to support mitigations, fixes, rollbacks, or follow-up validation.
* Assist with post-release verification and stabilization reporting.
* Document release observations, risks, incidents, and readiness concerns. ## Incident Response & Reliability Support
* Support incident response by gathering data, summarizing impact, identifying suspected causes, and tracking mitigation progress.
* Participate in post-incident reviews and help document lessons learned.
* Recommend improvements to monitoring, alerting, operational procedures, and service reliability practices.
* Maintain clear records of incidents, recurring issues, known risks, and follow-up actions.
* Help reduce operational toil by identifying repeatable troubleshooting steps, documentation gaps, and automation opportunities. ## On-Site Hardware & Environment Support
* Perform in-person troubleshooting for self-hosted systems, prototype devices, or test environments when telemetry or dashboards indicate issues.
* Assist with device configuration, deployment, validation, and live verification.
* Run smoke checks or readiness checks to confirm device, service, and environment health.
*…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×