×
Register Here to Apply for Jobs or Post Jobs. X

Lead Observability Engineer

Job in St. Louis, Saint Louis, St. Louis city, Missouri, 63105, USA
Listing for: Shiftcode Analytics, Inc
Full Time position
Listed on 2026-03-03
Job specializations:
  • IT/Tech
    Systems Engineer
Job Description & How to Apply Below
Location: St. Louis

Local candidates to Saint Louis - MO only, with address proof.

Role:1

Systems Performance Engineer Lead SE

Responsible for identifying and resolving end-to-end performance bottlenecks across distributed systems, Spring Boot services, middleware components, and hybrid cloud environments (private cloud + AWS). This role goes far beyond traditional testing by deeply analyzing container orchestration, networking paths, and system interactions under load. This position maps full system workflows, sets realistic latency budgets, and ensures each component meets its SLOs. Ideal candidates have extensive experience with high-scale, multi-region, and high-transaction platforms (e.g., financial systems, payment processing, or large enterprise SaaS) running in a Cloud environment.

Key Responsibilities

  • Define service-level objectives (SLOs), performance budgets, and latency/throughput targets across services.
  • Architect and champion comprehensive distributed tracing strategies (Dynatrace, AWS X-Ray, etc.).
  • Analyze application, platform, and cloud behavior using deep-dive techniques such as heap dumps, thread dumps, flame graphs, GC logs, network traces, and storage I/O profiling.
  • Review service and system architectures for performance risks (e.g., synchronous hops, excessive dependencies, misconfigured connection pools, poor cache placement).
  • Conduct and lead root-cause analysis for performance incidents in production and pre-production environments.
  • Develop capacity models and performance baselines for services running across cloud environments.
Areas of Expertise
  • Application Layer: Spring Boot internals, JVM tuning, thread/heap management, concurrency debugging, GC optimization
  • Container Runtime: PCF, Docker, container resource limits, CPU throttling, memory pressure
  • Orchestrators: PCF, Kubernetes, ECS (autoscaling, pod health, scheduling issues)
  • Networking: Service-to-service hops, TLS overhead, DNS, routing, load balancer configs (F5, Nginx, ALB/NLB), service mesh performance
  • Storage: Latency, IOPS constraints, distributed file system behavior
  • Caching & Middleware: Redis, Hazelcast, NATS, Kafka, Rabbit

    MQ configuration and throughput tuning
  • Databases: Connection pool tuning, slow queries, indexing, replication lag
  • Cloud Layer: AWS compute/storage/network performance, regional latency, cross-cloud traffic patterns
  • Role: 2
    Observability Engineer Senior SE

    Responsible for designing and operating the end-to-end observability across hybrid private cloud and AWS environments. This engineer ensures full visibility into system performance, service interactions, and user experience across all services and supporting middleware. The role goes beyond simple dashboards and alerts it focuses on deep instrumentation, distributed tracing, and architectural observability patterns that enable fast debugging and systemic performance improvement.

    Ideal candidates have experience operating large-scale and multi-region distributed systems in Cloud environments.
    Key Responsibilities
    • Architect and implement a unified observability strategy using Dynatrace.
    • Design and deploy distributed tracing across all Spring Boot microservices, ensuring end-to-end transaction visibility.
    • Engineer golden signals dashboards and trace-driven diagnostics that support real-time incident response and long-term trend analysis.
    • Lead instrumentation deep dives: JVM metrics, custom Micrometer metrics, trace attributes, log correlation, and database timing.
    • Implement and tune anomaly detection, alerting strategies, and noise reduction techniques.
    • Develop reference architectures and best practices for observability in hybrid cloud environments.
    • Perform root cause analysis for latency issues, error spikes, and system degradation incidents.
    • Mentor teams on observability tooling and ensure developers adopt instrumentation patterns by default.
    Areas of Expertise
  • Application Instrumentation: Spring Boot metrics/logging/tracing, Micrometer, custom instrumentation, trace context propagation.
  • Tracing & Telemetry: Dynatrace
  • Metrics Pipeline: Prometheus, Grafana, Dynatrace metrics, Cloud Watch metrics, histogram management, RED/USE methodologies.
  • Logging & Correlation: Structured logging, log-enrichment, log aggregation, trace-log correlation in Splunk.
  • Container & Orchestrator Observability: PCF, Kubernetes, ECS pod health, autoscaling, CPU throttling, memory pressure, node-level signals.
  • Cloud & Infrastructure Visibility: AWS compute/network/storage telemetry, VPC flow logs, ALB/NLB observability, network path tracing.
  • Database & Middleware Observability: Query latency, connection pool behavior, Redis/Kafka/Hazelcast metrics, MQ message flow visibility.
  • To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary