Job Description & How to Apply Below
Job Summary
• Responsible for the continuous improvement of the service level through timely analysis of data and corrective actions derived
• Provides performance reports and analysis, trends and pro-active recommendations based on in-depth trend analysis
• Drive and report achievements through key performance indicators and success stories
• Responsible for creating a framework for periodic reporting and identification for areas of improvement in operational performance parameters
• Define and help to develop interactive dashboards / analytical reports
• Lead department's branding and marketing strategy, manage and standardize communications channels and publications
• Support department head on general and team administrative, strategy execution, transformation, stakeholders and vendor management
• Streamline and simplify processes to support agile and speed to markets while maintaining high level of controls
Key Responsibilities Strategy
Resiliency
• Lead/part of SRE team to enhance application and infrastructure resiliency of service through self-healing and automated failovers - target a 99.9% up-time to customers.
• Oversee the planned/unplanned disruption of production infrastructure to ensure accountability for building resilient, always-on systems.
• Build resilience into the application so underlying system failures are handled gracefully and do not impact end users. Influence design/development teams to always be thinking of the rainy-day scenarios.
Efficiency
• Identify opportunities to eliminate all manual and repeatable activities (toil) via tooling and automation
• Reduce the number of repeat incidents by permanently fixing the underlying root cause of issues
Capacity Planning
• Develop automated predictive analysis of future capacity needs and drive the proactive upgrade of service capacity well in advance
• Using Standard Chartered' s SDI (Software Defined Infrastructure) develop auto-scaling to deliver robust resilience to fluctuations in critical service demand
• Continuously monitor service demand / capacity for any discrepancies or spikes
Business
Availability/Reliability
• Take responsibility for meeting SLA/XLA expectations around the operability and reliability of our critical user service journeys, where our customers expect a 24x7 digital service offering. Examples of "always on" techniques to be used include caching, circuit breakers, dark and canary releases, store and service patterns and alternate user experience flows.
• Lead, own, manage, monitor and optimize the reliability and health of all environments
• Design, code, implement break fixes to improve service availability based on outcomes from thematic reviews
Latency & Performance
• Drive conversation around development velocity using SLIs/SLOs data to ensure development velocity vs. service reliability is optimized in partnership with Product Teams
• Iteratively review SLI/SLO/Error Budget policy to ensure the quantitative indicators of customer experience are accurate
• Where an increased focus on reliability is required influence senior stakeholders to ensure resourcing / effort is made available
Processes
Transition to Production
• Champion and evolve continuous delivery best practice standards to reduce release related incidents, manual hands-off and achieve our aspiration of "zero ops"
• Partner with development teams to ensure applications are designed with scale, resilience, and performance in mind.
Monitoring
• Optimize monitoring to reduce false positive alerts
• Creatively deepen monitoring capabilities leveraging the 3 tenets of observability - logs, metrics and traces
• Ensure all critical user service journeys are traceable end to end
• Ensure Production Solutions are fit for purpose. Where gaps are identified put a plan in place to uplift the toolset
People & Talent
• Establish and manage SRE team when applicable
• Drive efficient target operating model and enhance the existing capabilities of the team.
• Lead through example…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×