API Production Reliability Engineer
Listed on 2026-05-31
-
IT/Tech
Cloud Computing, IT Support, Systems Engineer, SRE/Site Reliability
API Production Support Engineer - Officer
Apply (opens in new window)
Job Req :
Location(s):
Mississauga, Ontario, Canada
Job Type:
Hybrid
Posted:
May. 11, 2026
Discover your future at Citi
Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.
Job OverviewAt Citi, we’re passionate about building and maintaining highly reliable APIs that solve critical customer problems. We support mission-critical systems, empowering our customers with a rich feature set, high availability, and stellar performance levels to pursue their financial transactions. As we continue to expand our API scope and capabilities, we are seeking an experienced and dedicated API Production Support Engineer with complete hands-on responsibilities to ensure the operational excellence and continuous improvement of our API ecosystems.
This role requires an individual who brings fresh ideas, demonstrates a unique and informed viewpoint on API reliability, and enjoys collaborating with cross-functional teams to develop real-world solutions and ensure positive user experiences at every interaction. Our ultimate goal is to build proactive and predictive operational strategies, including leveraging intelligent automation, to avoid customer impacts.
Objectives of this Role
Champion stability initiatives to enable high availability and resilience for our API applications, including enhancing monitoring, failover mechanisms, and overall system health.
Demonstrate calm and analytical capabilities when faced with major incidents on critical API systems, ensuring effective incident, problem, and change management at a global enterprise level.
Perform proactive monitoring and management of production API environments, taking a holistic view of system health and performance.
Drive the definition, analysis, and reporting of SLIs and SLOs for all supported APIs and clients, ensuring clear performance benchmarks.
Contribute to the development and implementation of tools and systems designed to enhance API operational management and the client experience.
Measure and optimize API system performance, always pushing capabilities forward, anticipating customer needs, and innovating for continuous improvement.
Provide hands-on expert operational support for critical, large-scale distributed API ecosystems.
Daily and Monthly Responsibilities
Actively gather and analyze performance metrics from API platforms and underlying infrastructure to assist in performance tuning, fault finding, and capacity planning.
Partner closely with API development teams to improve services through rigorous operational feedback loops, testing, and release procedures.
Drive the creation of sustainable API operational systems and services through automation and continuous uplifts, including developing, testing, and debugging automated tasks.
Conduct thorough post-incident reviews for API-related issues, identifying opportunities for automation and proactive monitoring to prevent recurrence.
Actively participate in and take complete hands-on responsibility for high-priority API production support activities, ensuring swift resolution and clear communication.
Required Qualifications:
Extensive experience supporting Java and J2EE based applications.
Deep technical knowledge and hands-on experience supporting and troubleshooting environments including AWS, ECS, Oracle DB, and Mongo DB.
A strong understanding and practical application of SRE concepts, particularly in defining and measuring SLIs, SLOs and Error Budgets.
Demonstrated experience in building and utilizing comprehensive monitoring solutions such as App Dynamics, Splunk, Kibana to proactively alert on production API-related issues and ensure system health.
Mandatory: In-depth knowledge and hands-on experience with API Gateway technologies, specifically APIGEE, and CDN solutions like Akamai.
Proven ability to proactively identify and address problems, areas for improvement, and performance bottlenecks within complex API ecosystems using software-based…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).