AI Platform Engineer; Hybrid in NYC or CT
Listed on 2026-02-15
-
IT/Tech
Cloud Computing, Systems Engineer
Overview
We are seeking a Platform Engineer to help design, build, and operate the foundational cloud and application platforms that power our AI digital products and services. In this role, you will focus on creating reliable, secure, scalable and quality assured platforms that enable application teams to deliver software quickly and safely. You will work closely with infrastructure, security, and application teams to provide self-service capabilities, standardized tooling, ensure quality and strong operational practices across environments.
Responsibilities- Platform & Cloud Infrastructure – Build and operate cloud-based platform services that support application development and runtime workloads.
- Platform tooling – Design and maintain infrastructure using AWS services such as EC2, EKS, ECS, S3, RDS, IAM, VPC, Lambdas, Bedrock AI services and Cloud Watch.
- Infrastructure as Code – Implement and manage Infrastructure as Code (IaC) using Terraform, CDK, Cloud Formation, or similar tools.
- Workloads – Support containerized and non-containerized workloads across development, staging, and production environments.
- Reliability & Observability – Ensure platform reliability, availability, and performance using Dev Ops and SRE best practices.
- Monitoring – Implement and maintain monitoring, logging, and alerting for platform services.
- Incident Response – Participate in on-call rotations and incident response, contributing to root cause analysis and continuous improvement.
- Automation – Develop operational runbooks and automation to reduce manual workload.
- Security by Default – Build platforms that are secure by default, following least-privilege access and defense-in-depth principles.
- Compliance & Auditability – Partner with security and compliance teams to implement required controls, policies, and auditability.
- Patching & Upgrades – Support patching, vulnerability management, and platform upgrades.
- Quality Assurance – Carry out Quality Assurance tasks to ensure both application performance and compliance with security and governance standards.
- Platform Components – Create reusable platform components, templates, and tooling that enable self-service for application teams.
- Developer Experience – Improve developer experience through standardized CI/CD integrations and platform documentation.
- Consultation – Act as a consultative partner to application teams, helping them adopt platform services effectively.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances.
If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:
- 3–6+ years of experience in platform engineering, Dev Ops, SRE, or infrastructure engineering roles.
- Hands-on experience with AWS in production environments.
- Experience with Infrastructure as Code tools (Terraform preferred).
- Familiarity with containers and orchestration (Docker, Kubernetes, or ECS).
- Understanding of monitoring, logging, and alerting concepts.
- Experience with scripting or programming (Python, Bash, or similar).
- Experience operating Kubernetes platforms (EKS).
- Familiarity with CI/CD systems and deployment automation.
- Exposure to observability tools such as Cloud Watch, Prometheus, Grafana, Datadog, or similar.
- Experience working in large-scale or enterprise environments.
- Interest in improving developer productivity and platform usability.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).