FinOps Private Cloud Infrastructure Architect
Listed on 2026-06-18
-
IT/Tech
IT Infrastructure, Cloud Computing: Infrastructure & Operations
Job Description
The Fin Ops Private Cloud Infrastructure Architect leads the end-to-end architecture, metering, and operational governance for private cloud infrastructure supporting LLM and agentic AI workloads, including GPU and accelerated compute platforms.
This role is accountable for ensuring accurate internal cloud usage metering, cost transparency, observability, and retention governance across a hybrid data center environment. The architect owns the Platform API Inventory and Collection Interval Validation Matrix across the AI ecosystem, ensuring all required telemetry is inventoried, validated, and collected at correct intervals to meet Fin Ops, security, reliability, auditability, and regulatory requirements.
The role also brings hands‑on Fin Ops experience within a large financial organization and owns the per‑platform telemetry retention audit—a critical enabler for resilience, recovery, and warm‑up operational readiness following incidents, maintenance, patching, or disaster recovery events.
Key Responsibilities Private Cloud & AI Infrastructure Architecture- Lead the architecture and governance of private cloud infrastructure supporting LLM and agentic AI platforms
- Architect and govern GPU and accelerated compute platforms, including cluster design, scheduling, capacity planning, and lifecycle management
- Design and operate infrastructure within a hybrid data center model, spanning private cloud, on‑prem virtualization, container platforms, storage, and network
- Lead the implementation of internal cloud usage metering for private cloud platforms
- Own Fin Ops governance for infrastructure platforms, including showback/chargeback models, cost allocation and unit economics, capacity and usage transparency
- Partner with Finance and Engineering to align infrastructure cost models with business consumption
- Own the Platform API Inventory and Collection Interval Validation Matrix
- Ensure all platform, infrastructure, observability, and cost telemetry APIs are properly inventoried, actively validated, and collected at correct intervals
- Govern telemetry coverage across metrics, logs, traces, billing and cost data, capacity signals, and model‑serving and AI platform telemetry
- Ensure telemetry programs meet security, audit, risk, and reliability standards
- Own per‑platform telemetry retention audits, including data availability and completeness
- Ensure retention policies support incident investigation, compliance and audit requirements, capacity and cost analysis, warm‑up recovery design, enabling rapid restoration of operational readiness after outages, upgrades or DR events
- Partner with resilience and recovery teams to validate operational dependencies and recovery paths
- Partner with Engineering, Platform, Finance, Risk, Security, and Operations teams
- Serve as the authoritative architectural voice for private cloud Fin Ops and AI infrastructure telemetry
- Communicate architectural decisions, risks, and trade‑offs clearly to senior stakeholders
- 10+ years of experience in infrastructure architecture, platform engineering, or private cloud engineering within large‑scale enterprise environments
- Demonstrated experience designing and operating hybrid data center infrastructure
- Hands‑on experience with GPU platforms and accelerated compute operations
- Proven ownership of observability and telemetry programs, including API inventory and validation, metrics, logs, and traces strategy, collection interval tuning, data quality and reliability controls
- Direct Fin Ops experience in a large organization, including infrastructure cost governance
- Strong understanding of resilience and recovery engineering, data retention strategies, operational readiness and warm‑up dependencies
- Excellent stakeholder management and ability to influence across engineering, finance, and risk organizations
- Fin Ops Certified Practitioner or Fin Ops Certified Professional
We are a company committed to creating diverse and inclusive environments where people can bring their…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).