×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Specialist

Job in 500001, Hyderabad, Telangana, India
Listing for: ViaPlus
Full Time position
Listed on 2026-02-26
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support, Cloud Computing
Job Description & How to Apply Below
Via Plus is seeking a Lead Cloud SRE to own the reliability, availability, and performance of large-scale, mission-critical platforms running on Microsoft Azure. This role is responsible for maintaining production stability across complex, distributed systems by leading incident response, observability, and reliability engineering initiatives.

The Lead Cloud SRE will work hands-on with Azure infrastructure, Kubernetes-based and VM-hosted microservices, networking, and data platforms to diagnose and resolve high-severity production issues. The role involves deep root-cause analysis using telemetry from Azure Monitor, Application Insights, and Log Analytics, as well as driving long-term remediation through automation, architectural improvements, and systemic fixes.

About Viaplus   :
Via Plus is a global mobility company in the Intelligent Transportation Systems (ITS) market, specializing in revenue and services management solutions for the transportation industry. Our customer operations, data analytics, and full-featured single-account back-office technology facilitate the high-volume transactions, required for seamless multimodal mobility. As a VINCI Concessions subsidiary, we are committed to technical innovation and to promoting a positive mobility experience for all.

We are pioneers in the transportation transaction and mobility industry, with a decade of proven global experience in providing solutions focused on the tolling and transit industries.

Via Plus is headquartered near Dallas, Texas and maintains offices across the United States and in France, India, and Ireland. We are part of the global network of VINCI Concessions, an international player in transport infrastructure with projects in 23 countries. Our vision has evolved to provide a fully automated, end-to-end transportation solution that significantly improves revenue collection and efficiency while effectively lowering costs for our agency clients.

We serve enterprises that require high-volume, real-time transactions processing with the highest levels of accuracy, especially where revenue reconciliation and customer account management are key deliverables to the customer experience. Our flagship back-office system (BOS) enables Mobility-as-a-Service (MaaS) with a “one account” feature that supports multimodal transportation solutions. In a rapidly-changing environment, Via Plus maintains a strong focus on technology and continuous R&D to improve agency efficiencies, reduce operating expenses, and maximize revenue – all while providing exceptional customer service.

About Indian Operations:
Plan, Design and Develop New Features for our Products | Customize our product on request from our premium Clients | Provide end-to-end IT Infrastructure set-up and Maintenance for global Clients | 24/7 Support and provide services to our ASP Clients

Job Profile:
Lead Cloud SRE

Experience:

12 - 18 yrs

Job Responsibilities:

1. Azure Infrastructure & Virtual Machine Reliability
Diagnose and resolve complex Azure VM issues including boot failures, performance degradation, disk I/O latency, and memory leaks.
Troubleshoot VM Scale Sets, OS-level issues across Linux and Windows, and patching or upgrade failures.
Analyze and remediate network connectivity issues involving NSGs, UDRs, DNS resolution, and routing configurations.

2. Application & Microservices Reliability
Support and troubleshoot microservices-based architectures hosted on AKS and virtual machines.
Identify and resolve inter-service latency, timeouts, retry storms, and cascading failure scenarios.
Diagnose application-level issues such as thread pool exhaustion, memory leaks, misconfigurations, and resource contention.
Eliminates certificate, authentication, and upstream/downstream dependency failures impacting service availability.

3. Azure Service Fabric Operations
Maintain and restore Service Fabric cluster health and stability.
Troubleshoot node failures, replica movement delays, quorum loss, and partition health issues.
Investigate upgrade and rollback failures, ensuring minimal service disruption.
Analyze and optimize both stateful and stateless service behaviours.

4. Traffic Management, Load Balancing & Edge Services
Azure Application Gateway
Troubleshoot HTTP 502/503/504 errors and backend pool health issues.
Debug probe failures, SSL/TLS termination, listener configurations, and routing rules.
Optimize WAF rules for security, performance, and reduced false positives.
Azure Front Door
Diagnose routing, caching, latency issues, and WAF-related traffic blocks.
Investigate backend connectivity, health probes, and geo-routing behaviour.
NGINX / Reverse Proxies
Debug connection resets, upstream timeouts, and worker exhaustion.
Tune timeouts, buffers, keep-alive settings, and load-balancing strategies for high availability.

5. Database & Data Layer Reliability
Troubleshoot Azure SQL, Managed Instances, Postgre

SQL, MySQL, and Cosmos DB.
Analyze slow queries, deadlocks, connection pool exhaustion, and resource contention.
Manage…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary