Lead Reliability Engineer
Listed on 2025-12-30
-
IT/Tech
Cybersecurity, Systems Engineer, IT Support, Cloud Computing
Overview of the Company
Citi, the leading global bank, has approximately 200 million customer accounts and does business in more than 160 countries and jurisdictions. Citi provides consumers, corporations, governments, and institutions with a broad range of financial products and services, including consumer banking and credit, corporate and investment banking, securities brokerage, transaction services, and wealth management.
As a bank with a brain and a soul, Citi creates economic value that is systemically responsible and in our clients’ best interests. As a financial institution that touches every region of the world and every sector that shapes your daily life, our Enterprise Operations & Technology teams are charged with a mission that rivals any large tech company. Our technology solutions are the foundations of everything we do from keeping the bank safe, managing global resources, and providing the technical tools our workers need to be successful to designing our digital architecture and ensuring our platforms provide a first-class customer experience.
We reimagine client and partner experiences to deliver excellence through secure, reliable, and efficient services.
Our commitment to diversity includes a workforce that represents the clients we serve from all walks of life, backgrounds, and origins. We foster an environment where the best people want to work. We value and demand respect for others, promote individuals based on merit, and ensure opportunities for personal development are widely available to all. Ideal candidates are innovators with well-rounded backgrounds who bring their authentic selves to work and complement our culture of delivering results with pride.
If you are a problem solver who seeks passion in your work, come join us. We’ll enable growth and progress together.
The selected candidate will become the key engineer in supporting and advancing the platform used for threat-modeling process in Citi. The responsibilities will cover (among others) maintaining and supporting the threat-modeling application as well as developing relevant tools used throughout the threat-modeling process. The application is comprised of web servers and backend data storage databases and supporting it requires understanding of middleware, database, container, and AWS cloud environment as well as change-control and compliance processes.
We are seeking a highly skilled and dedicated Lead Application Reliability Engineer to ensure the continuous availability, optimal performance, and security of a critical threat-modeling application. This role is central to our operational excellence, involving comprehensive support and maintenance of a robust technology stack including middleware, databases, Linux, and AWS EKS, all within a strictly regulated and change-controlled financial environment. The ideal candidate will leverage modern Dev Ops principles to drive stability and efficiency.
Responsibilities- Ensure high availability and optimal performance of the threat-modeling application through proactive monitoring, incident management, and efficient troubleshooting.
- Perform routine and emergency application and infrastructure maintenance, including patching, upgrades, and configuration management, adhering strictly to change control procedures.
- Conduct root cause analysis (RCA) for production incidents and implement preventative measures to minimize future occurrences.
- Develop and maintain automation scripts and tools (using Python, Bash) to streamline operational tasks, improve monitoring, and facilitate efficient deployments.
- Proactively identify, recommend, and implement enhancements to existing application maintenance practices, operational workflows, and system reliability.
- Serve as a technology subject matter expert for internal and external stakeholders, contributing to technology domain roadmaps and firm-mandated controls and compliance initiatives.
- Appropriately assess and mitigate risk in all technical decisions, ensuring compliance with applicable laws, rules, regulations, and internal policies, while escalating and reporting control issues with transparency.
- Present…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).