×
Register Here to Apply for Jobs or Post Jobs. X

SRE Support Engineer - Observability

Job in Austin, Travis County, Texas, 78716, USA
Listing for: Gigster
Full Time position
Listed on 2025-12-27
Job specializations:
  • IT/Tech
    IT Support, Technical Support, Cybersecurity, Network Security
Job Description & How to Apply Below

Role Overview

The Observability & Tools Support Engineer provides high-impact technical support for customers of a large technology company’s internal IaaS platform, with a focus on monitoring, alerting, telemetry, and operational tooling.

This role spans a wide range of support—from white-glove onboarding and end-to-end customer enablement, to deep technical troubleshooting across Linux, networking, and observability systems (especially Prometheus and Alert Manager). You will also contribute to improving the support function itself: strengthening tooling, documentation, workflows, and feedback loops so the service scales.

Success depends on excellent troubleshooting, strong written communication, comfort working with highly technical customers, and the maturity to identify patterns and drive operational improvements beyond individual ticket resolution.

Business Outcome

Become a trusted frontline expert for the customer’s observability ecosystem and operational tooling - delivering fast, accurate support across Slack and tickets, improving monitoring reliability, and reducing incident impact through better triage, troubleshooting, onboarding, and knowledge capture.

Success Measures
  • Healthy volume of threads and tickets handled with high-quality outcomes
  • Consistent achievement of time-based SLAs
  • High customer satisfaction through surveys
  • Accurate classification of issue type, severity, and recurring patterns
  • Reduced repeat issues through better docs, tooling, and scalable onboarding
What Will Be True When You Succeed
  • Customers can onboard smoothly to monitoring/alerting with minimal friction
  • Monitoring and alerting issues are resolved quickly, with fewer escalations
  • Linux and networking-related incidents reach resolution faster due to strong troubleshooting and clean handoffs
  • Engineering and SRE teams receive clear, actionable feedback based on real customer trends
  • Knowledge base content prevents tickets and accelerates self-service
Core Work Units
  • Frontline Support for Observability & Tooling
    • Manage Slack threads and tickets (roughly 50/50)
    • Handle a broad range of customer support: simple issue resolution through end-to-end onboarding
    • Provide clear, structured guidance to highly technical customers
    • Maintain strong attention to detail while managing multiple interactions in parallel
  • Deep-Dive Troubleshooting & Incident Support
    • Troubleshoot, isolate, and resolve monitoring and alerting issues (especially Prometheus + Alert Manager)
    • Troubleshoot complex Linux and networking issues (TCP/IP fundamentals required)
    • Support Open Telemetry, tracing, and telemetry pipelines, including investigation of gaps in signals and instrumentation
    • Drive incidents to resolution in partnership with Engineering/SRE teams
  • Documentation & Knowledge Development
    • Build and maintain customer-facing and internal knowledge base articles
    • Create informational posts for the community support platform
    • Turn repeated issues into reusable guides, checklists, and onboarding playbooks
  • Trend Analysis & Feedback to Engineering
    • Analyze and categorize customer interaction trends
    • Provide accurate, meaningful feedback to Engineering and SRE orgs to improve product/tooling
    • Identify “top offenders” and propose practical fixes (tooling, docs, process, product)
  • Operational Excellence & Continuous Improvement
    • Participate in post-mortem reviews and drive follow-through on improvements
    • Contribute meaningfully to team objectives and goals (process, tooling, and service scaling)
    • Bring creativity and discretion to resolve highly complex issues “outside the box”
  • High-Quality Work - what top performance looks like

    Frontline Support

    • Moves smoothly from triage to deeper analysis without losing the customer
    • Communicates clearly and confidently with technical users
    • Maintains clean follow-ups and thread hygiene even with high context switching

    Troubleshooting

    • Rapidly isolates issues across monitoring/alerting configs, Linux runtime behavior, and network connectivity
    • Uses structured approaches to incident handling: hypothesis → test → evidence → resolution
    • Produces high-signal writeups that accelerate downstream resolution

    Documentation & Enablement

    • Documentation is clear enough that customers avoid…
    To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary