Senior Site Reliability Engineer – Distributed Systems Job Arizona City area,Arizona USA,IT/Tech

About the role

As a Site Reliability Engineer, you will make an impact by designing and implementing observability solutions tailored for distributed edge computing environments. You will be a valued member of the Technology & Engineering team and work collaboratively with cross-functional teams to ensure system reliability, performance, and visibility across remote facilities.

In this role, you will

Design and implement observability frameworks for edge computing environments, including monitoring, logging, tracing, and metrics collection.
Define and maintain SLIs, SLOs, and business KPIs to measure and enhance system reliability across edge and centralized infrastructure.
Build dashboards, visualizations, and alerting systems for real-time insights and incident response.
Implement distributed tracing and log aggregation systems to troubleshoot complex edge issues.
Collaborate with engineering teams to embed observability best practices into edge applications and infrastructure.
Proactively identify issues using advanced observability tools, reducing MTTD and MTTR.
Lead incident postmortems and implement observability-driven improvements.
Develop automation scripts and tools to optimize observability pipelines for bandwidth-constrained environments.
Optimize data storage and querying strategies for performance, cost, and scalability.
Stay current with emerging observability trends and advocate for adoption of edge-specific solutions.

Work model

At Cognizant, we strive to provide flexibility wherever possible, and we are here to support a healthy work-life balance through our various wellbeing programs. Based on this role’s business requirements, this is an onsite position requiring 5 days a week in a client or Cognizant office.

Please note:

This role will require an in-person meet and greet at our Cognizant office or client location.

The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.

What you need to have to be considered

10+ years of IT experience
3–5 years of experience in service reliability/operations for large-scale hybrid environments.
3–5 years of experience writing automation scripts and building dashboards for application performance management.
2–4 years of experience with programming languages such as Go, Python, Java, or Rust.
Working knowledge of databases such as Oracle, SQL Server, Redis, Click House, Postgre

SQL, Mongo

DB, or time-series databases.
At least 2 years of experience with cloud platforms and containerization (GCP, AWS, Rancher, Azure, Open Shift).
Experience maintaining containerized apps in GKE/RKE/AKE environments.
Experience implementing cloud observability using Open Telemetry (OTEL).
Experience with Graph

QL frameworks (Apollo, Prisma, Hasura).
Strong understanding of networking protocols (TCP/IP, HTTP, DNS, load balancing, service mesh).

These will help you stand out

Proven experience managing application availability and building automation for high-availability platforms.
Hands-on experience with monitoring tools like Splunk, App Dynamics, Grafana/Prometheus, and Dynatrace.
Experience with CI/CD tools and extenders such as Rally and Confluence.
Experience with in-memory caching solutions (Redis preferred).
Strong debugging skills across integrated technical platforms and API gateways.
Hands-on experience with GCS, Cloud SQL, Spanner, and Firestore.
Experience in enterprise-level infrastructure and operations.
Expertise in high-availability and distributed systems, Linux/Windows administration, and support.
Experience monitoring and troubleshooting Hashi Corp Vault environments.
Working knowledge of Vertex AI, Gen AI, and Big Query.

Bachelor’s degree in computer science, IT or equivalent

Salary and Other Compensation

The annual salary for this position is depending on experience and other qualifications of the successful candidate.

This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.

Benefits:
Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:

Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
401(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan

Disclaimer:
The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language