API Production Reliability Engineer Job New York New York USA,IT/Tech

Location: New York

API Production Support Engineer - Officer

Apply (opens in new window)

Job Req :

Location(s):

Mississauga, Ontario, Canada

Job Type:

Hybrid

Posted:

May. 11, 2026

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

At Citi, we’re passionate about building and maintaining highly reliable APIs that solve critical customer problems. We support mission-critical systems, empowering our customers with a rich feature set, high availability, and stellar performance levels to pursue their financial transactions. As we continue to expand our API scope and capabilities, we are seeking an experienced and dedicated API Production Support Engineer with complete hands-on responsibilities to ensure the operational excellence and continuous improvement of our API ecosystems.

This role requires an individual who brings fresh ideas, demonstrates a unique and informed viewpoint on API reliability, and enjoys collaborating with cross-functional teams to develop real-world solutions and ensure positive user experiences at every interaction. Our ultimate goal is to build proactive and predictive operational strategies, including leveraging intelligent automation, to avoid customer impacts.

Objectives of this Role

Champion stability initiatives to enable high availability and resilience for our API applications, including enhancing monitoring, failover mechanisms, and overall system health.
Demonstrate calm and analytical capabilities when faced with major incidents on critical API systems, ensuring effective incident, problem, and change management at a global enterprise level.
Perform proactive monitoring and management of production API environments, taking a holistic view of system health and performance.
Drive the definition, analysis, and reporting of SLIs and SLOs for all supported APIs and clients, ensuring clear performance benchmarks.
Contribute to the development and implementation of tools and systems designed to enhance API operational management and the client experience.
Measure and optimize API system performance, always pushing capabilities forward, anticipating customer needs, and innovating for continuous improvement.
Provide hands-on expert operational support for critical, large-scale distributed API ecosystems.

Daily and Monthly Responsibilities

Actively gather and analyze performance metrics from API platforms and underlying infrastructure to assist in performance tuning, fault finding, and capacity planning.
Partner closely with API development teams to improve services through rigorous operational feedback loops, testing, and release procedures.
Drive the creation of sustainable API operational systems and services through automation and continuous uplifts, including developing, testing, and debugging automated tasks.
Conduct thorough post-incident reviews for API-related issues, identifying opportunities for automation and proactive monitoring to prevent recurrence.
Actively participate in and take complete hands-on responsibility for high-priority API production support activities, ensuring swift resolution and clear communication.

Required Qualifications:

Extensive experience supporting Java and J2EE based applications.
Deep technical knowledge and hands-on experience supporting and troubleshooting environments including AWS, ECS, Oracle DB, and Mongo DB.
A strong understanding and practical application of SRE concepts, particularly in defining and measuring SLIs, SLOs and Error Budgets.
Demonstrated experience in building and utilizing comprehensive monitoring solutions such as App Dynamics, Splunk, Kibana to proactively alert on production API-related issues and ensure system health.
Mandatory: In-depth knowledge and hands-on experience with API Gateway technologies, specifically APIGEE, and CDN solutions like Akamai.
Proven ability to proactively identify and address problems, areas for improvement, and performance bottlenecks within complex API ecosystems using software-based…