×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer, Azure

Job in Irving, Dallas County, Texas, 75084, USA
Listing for: Wellfit Technologies
Full Time position
Listed on 2026-05-24
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing, IT Support
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Wellfit is the dental industry’s fintech solution
, breaking down financial barriers so patients, providers, employers, and payors can all access better care. As a healthcare fintech innovator, we’re transforming the patient journey and redefining what’s possible in dental care.

About Wellfit

Wellfit is the dental industry’s fintech solution, breaking down financial barriers so patients, providers, employers, and payors can access better care. As a healthcare fintech innovator, we are transforming the patient journey and redefining what is possible in dental care. Today, Wellfit supports a growing production platform serving 1,100+ offices and processing $1.5B+ in annual transactions
. As we continue to scale, reliability, observability, alerting, and production readiness are critical to how we support our customers and deliver with confidence.

About the Role:

We are seeking a hands‑on Platform Reliability Engineer, Azure to help strengthen the reliability, visibility, and operational maturity of our Azure‑based platforms.

This role is ideal for someone who enjoys working directly in Azure, improving production systems, troubleshooting issues across infrastructure and application layers, and building practical monitoring and alerting solutions that help teams respond faster and operate more confidently.

You do not need to be an expert in every part of the stack on day one. We are looking for someone with strong Azure experience, solid troubleshooting instincts, a Dev Ops/reliability mindset, and the ability to collaborate closely with engineering teams across systems, services, and applications.

What You’ll Do
  • Own and improve monitoring, alerting, and observability across Azure‑based production systems.
  • Work directly in Azure Monitor, Application Insights, App Services, logs, metrics, traces, and related Azure tooling to troubleshoot reliability and performance issues.
  • Build and refine practical alerting workflows, including Sev0/Sev1 alert routing, escalation paths, and run‑book integration.
  • Create and maintain clear, actionable runbooks that help on‑call engineers respond confidently to production incidents.
  • Partner with engineering teams to investigate issues across infrastructure, configuration, deployments, services, and application behavior.
  • Support release readiness by improving visibility into critical Azure resources before, during, and after production deployments.
  • Build and maintain dashboards in tools such as Grafana, Azure Monitor, Application Insights, or similar observability platforms.
  • Help configure incident routing integrations, including Slack/webhook‑based alert delivery to the appropriate team channels.
  • Automate repeatable operational tasks using Power Shell, Logic Apps, Azure tooling, or similar workflow automation methods.
  • Contribute to RCA documentation, incident follow‑up, reliability improvements, and operational playbook development.
What We’re Looking For
  • Hands‑on experience supporting production systems in Azure.
  • Strong working knowledge of Azure App Services, Azure Monitor, Application Insights, and Azure production troubleshooting.
  • Experience with Dev Ops, cloud operations, site reliability, platform engineering, or production support in a hands‑on environment.
  • Strong troubleshooting instincts and the ability to work through ambiguous production issues.
  • Comfort working across logs, metrics, traces, alerts, configurations, deployments, and service dependencies.
  • Ability to collaborate with software engineering teams across the stack, including .NET, Angular, SQL, APIs, and cloud services.
  • Experience building or improving dashboards, alerts, runbooks, incident workflows, or operational playbooks.
  • Working knowledge of scripting or automation, preferably with Power Shell, Logic Apps, CLI tooling, or similar technologies.
  • Clear communication skills with the ability to document findings, explain issues, and drive follow‑through after incidents.
  • A high‑ownership mindset with the ability to create structure, improve processes, and operate effectively in a fast‑moving environment.
Preferred Experience
  • Azure certifications.
  • Grafana, Prometheus, Data Dog, Dynatrace, or similar observability/APM tools.
  • S…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary