×
Register Here to Apply for Jobs or Post Jobs. X

Senior Manager, Core Infrastructure Engineering

Job in Nashville, Davidson County, Tennessee, 37230, USA
Listing for: Oracle
Full Time position
Listed on 2026-06-22
Job specializations:
  • IT/Tech
    Systems Engineer
Job Description & How to Apply Below
** Job Description*
* As a Senior Manager, you will lead a team responsible for the development, operation, and improvement of large-scale OCI network fabrics and supporting systems. This role requires deep networking expertise, especially in automation of  Network Clos fabrics, telemetry, and performance troubleshooting, combined with software engineering experience. You will build and improve tools, automation, monitoring, and operational systems that make these fabrics more reliable, observable, and efficient at global cloud scale.

You will work closely with Network Availability, Network Monitoring, GNOC, hardware engineering, and service teams to resolve complex customer escalations, improve operational readiness, and drive engineering programs that increase performance and availability. The ideal candidate brings both hands-on technical depth and strong people leadership, with experience managing engineers who operate and build software for large-scale distributed infrastructure.

** Responsibilities*
* ** System Design & Architecture - System Scalability:*
* + Manages the development and implementation of scalable distributed systems and components across multiple teams, including the effective use of distributed state management tools.

+ Oversees code and/or system optimization efforts for large-scale data processing and high-throughput requirements within and across teams to support hyper-scale systems.

+ Guides teams to define scalability requirements for owned components and ensures design and implementation requirements are met.

+ Manages the use of data plane platforms to effectively handle large-scale data retrieval, storage, and processing.

+ Ensures team accurately designs performance and load testing.

** System Design & Architecture - System Reliability Design:*
* + Manages the strategy for building fault-tolerant components and systems capable of withstanding in-service updates by guiding the implementation of redundancy, replication, and automatic failover mechanisms.

+ Develops design strategies for systems to effectively handle service disruptions (e.g., network partitions) by prioritizing consistency, availability, or partition tolerance.

+ Leads implementation and optimization initiatives across teams for approaches to handle network unreliability, including load-shedding, throttling, and rate-limiting.

+ Guides teams to design components and systems that are durable and adhere to service level objectives (SLOs), setting expectations for availability and durability of other computing services within the department.

** System Design & Architecture - System Reliability Performance:*
* + Provides oversight in defining key performance indicators (KPIs) and telemetry to identify gaps or issues in running systems.

+ Oversees the building and customization of moderately complex dashboards, telemetry systems, and alerting mechanisms to proactively monitor components and system health.

** System Design & Architecture - Correctness / Availability:*
* + Oversees the design and implementation of functional and correctness requirements for feature sets and/or systems in new or existing systems.

+ Guides teams to design complex test scenarios (e.g., fault-injection, brown-out) to evaluate system correctness.

+ Directs implementation strategies for data replication and synchronization techniques to maintain data integrity and availability.

** Operational Troubleshooting & Incident Management:*
* + Guides teams to be proactive when diagnosing, debugging, and resolving issues in active components and systems to support ongoing operation.

+ Ensures teams leverage expertise to prevent interruptions, ensuring no maintenance windows are required for customers and users when resolving issues.

+ Oversees operational readiness protocol and ensures teams remain knowledgeable of owned components and systems to support effective troubleshooting and performance.

+ Oversees and approves schedules for operational support rotations.

** Compliance & Security:*
* + Oversees implementation of robust security measures to protect data and applications in multi-tenant environments, ensuring team strategies incorporate encryption techniques and access controls.

+ Directs execution of remediation plans to address identified security gaps, promoting continuous improvement of security measures.

+ Ensures comprehensive documentation and cloud infrastructure compliance with industry standards and regulations.

** Automation & Change Management:*
* + Oversees the development and maintenance of automation scripts and tools (e.g., Infrastructure as Code (IaC)) to manage cloud infrastructure.

+ Works with teams to create and adhere to change management plans for patching, updating, and rolling back applications, and guides development of components to allow for automation of these processes.

** Core Responsibilities*
* ** Planning & Execution:*
* + Manages multiple medium- to large-scale projects or initiatives across teams, ensuring timelines, deliverables, and…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary