Site Reliability Engineer II Job Indiana Borough area,Pennsylvania USA,IT/Tech

Overview

Candescent is the leading cloud-based digital banking solutions provider for financial institutions. We are transforming digital banking with intelligent, cloud-powered solutions that connect account opening, digital banking, and branch experiences for financial institutions. Our advanced technology and developer tools enable seamless, differentiated customer journeys that elevate trust, service, and innovation. Success here requires flexibility in a fast-paced environment, a client-first mindset, and a commitment to delivering consistent, reliable results as part of a performance-driven, values-led team.

With team members around the world, Candescent is an equal opportunity employer.

Position

Site Reliability Engineer II

Experience: 4-6 Years

Location: Bangalore (Ecospace)

Role overview

Candescent Site Reliability Engineering (SRE) mission is to proactively ensure the reliability, availability and performance of our Digital First banking applications. As a member of the SRE team, you will focus on building and operating highly reliable application platforms by applying SRE principles such as automation, observability, resilience and continuous improvement.

You will partner closely with application and platform teams to define reliability standards, implement monitoring, alerting and incident response practices and embed scalability and performance considerations into application design and delivery. Through tooling, automation, and best practices, you will help development teams build and operate services that meet agreed reliability objectives.

As a senior engineer in the organization, you will also provide mentorship within the SRE team and across peer engineering teams, helping elevate operational maturity, drive adoption of SRE practices, and strengthen reliability culture across our core initiatives.

Responsibilities

Support and operate production applications running on Kubernetes and AWS
Troubleshoot application-level issues using logs, metrics, traces, and runtime signals
Participate in incident response, root cause analysis, and post-incident reviews
Work closely with development teams to understand application architecture, dependencies, and data flows
Improve application observability by defining meaningful alerts, dashboards, and SLOs
Automate repetitive operational tasks to reduce toil
Support application deployments, rollbacks, and runtime configuration changes
Identify reliability, performance, and scalability gaps in application behavior
Drive continuous improvements in operational readiness, runbooks, and on-call practices
Influence application teams to adopt shift-left reliability practices

Must-Have Skills & Experience

Hands-on experience supporting Java applications in production
Strong understanding of JVM fundamentals (heap/memory management, garbage collection, OOM issues, thread analysis)
Proven experience with SRE practices, including:

Incident response and on-call support
Root cause analysis and postmortems
SLIs, SLOs, and reliability-driven operations

Strong experience troubleshooting using application logs, metrics, and monitoring tools
Experience operating Java applications on Kubernetes (EKS) from an application/runtime perspective
Experience with deployment strategies (rolling, blue/green, canary)
Ability to write automation and scripts (Python or any) to reduce operational toil
Solid understanding of application architecture and service dependencies (databases, messaging systems, external APIs)
Strong collaboration and communication skills; ability to work closely with development teams
Demonstrates accountability and sound judgment when responding to high-pressure incidents

Good-to-Have Skills & Experience

Exposure to platform or infrastructure concepts supporting application workloads
Experience with AWS services such as EKS, RDS/Aurora, S3, EFS, and Cloud Watch
CI/CD pipeline experience (Git Hub Actions, Jenkins)
Familiarity with Git Ops practices
Experience with cloud migrations or modernization efforts

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language