Observability Engineer
Listed on 2025-12-20
-
IT/Tech
Systems Engineer, IT Support, Cloud Computing, SRE/Site Reliability
Observability Engineer
Location: Iselin, NJ (3 days onsite)
Employment: Long-term contract through March 2027
Looking for an experienced Observability Engineer to build, enhance, and support enterprise-wide observability across applications, services, and platforms. This role is responsible for establishing scalable, standardized monitoring, alerting, and telemetry frameworks on the TCOO Open Shift Kubernetes environment to ensure reliability, performance, and operational excellence.
Key Responsibilities Platform EngineeringDesign, deploy, and manage the enterprise observability platform on Open Shift/Kubernetes.
Ensure services and pipelines have proper monitoring, logging, tracing, and alerting.
Build and maintain centralized dashboards for metrics, alerts, and KPIs.
Improve platform scalability, reliability, availability, and overall performance.
Define and maintain standardized observability contracts for development and platform teams.
Integrate observability capabilities with enterprise tools and operational systems.
Support engineers in designing observability for their services and help product teams define meaningful KPIs.
Ensure observability processes comply with enterprise governance and controls.
Build IaC and deployment pipelines to automate observability platform operations.
Modernize processes and improve automation around observability workflows.
Contribute documentation, training, and operational runbooks.
Strong background in building and maintaining observability platforms.
Expertise in centralized dashboards, monitoring, heartbeat checks, infrastructure monitoring, APM, alerting, logging, and distributed tracing.
Ability to enforce observability standards across engineering teams.
Hands‑on experience with platforms such as Dynatrace, New Relic, Datadog, ELK/Elastic Stack, and Sig Noz.
Kubernetes / Open Shift
Argo CD
Infrastructure as Code
Git, Ansible, Bash, and general operational tooling
7+ years in engineering, Dev Ops, SRE, platform engineering, or observability-focused roles
Proven experience implementing and scaling observability systems in large enterprises
Strong understanding of distributed systems, cloud-native architectures, and CI/CD pipelines
Excellent communication and collaboration with cross-functional technical teams
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).