Senior Platform Engineer
Job in
Boston, Suffolk County, Massachusetts, 02298, USA
Listed on 2026-06-03
Listing for:
Axiomatic_AI
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, SRE/Site Reliability
Job Description & How to Apply Below
Axiomatic AI is building a new class of AI systems designed to reason with the rigor of the scientific method. By combining deep learning with formal logic and physics-based modeling, we create verifiable, interpretable AI systems that collaborate with and support human researchers in high-stakes scientific and engineering workflows.
Our mission, 30×30, is to deliver a 30× improvement in the speed, accessibility, and cost of semiconductor and photonic hardware development by 2030.
We aim to revolutionize hardware design and simulation in these industries and are building a team of highly motivated professionals to bring these innovations from research into commercial products.
Position Overview
As a Senior Platform Engineer at Axiomatic, you will own the reliability, deployment, and operational excellence of our AI platform. This role focuses primarily on infrastructure, CI/CD, and operations, with additional responsibilities for automation and tooling development.
You will:
- Lead deployment strategies and CI/CD pipelines across multiple environments
- Architect and maintain multi-cloud infrastructure (Azure, AWS, GCP) and on-premise deployments
- Own infrastructure as code using Terraform to automate provisioning and configuration
- Build comprehensive observability systems: monitoring, metrics, logging, and alerting
- Implement security controls, compliance frameworks, and data governance policies
- Develop automation tools, APIs, and scripts (Python) to improve operational efficiency
- Ensure system reliability, performance, and scalability
- Drive incident response, postmortems, and continuous improvement
- Troubleshoot infrastructure and application issues across multiple environments.
Deployment & CI/CD
- Design and implement deployment pipelines for multi-environment releases (dev, staging, production)
- Own the full deployment lifecycle: build, test, release, and rollback strategies
- Implement blue-green deployments, canary releases, and progressive rollouts
- Build automated deployment tooling and workflows
- Ensure zero-downtime deployments and rollback capabilities
- Optimize build and deployment performance
- Manage artifact repositories and container registries
- Design and operate multi-cloud infrastructure across Azure, AWS, and GCP
- Architect and deploy on-premise solutions for enterprise customers (Linux-based)
- Manage Kubernetes clusters, container orchestration, and networking
- Implement disaster recovery, backup strategies, and business continuity
- Optimize cloud costs and resource utilization
- Define and track SLIs, SLOs, and error budgets for critical services
- Write and maintain Terraform modules for infrastructure provisioning
- Implement Git Ops workflows for infrastructure changes
- Automate infrastructure scaling, updates, and operations
- Ensure reproducible and version-controlled infrastructure
- Design comprehensive monitoring, logging, and alerting (Prometheus, Grafana, Datadog, or similar)
- Build dashboards for system health, performance, and business metrics
- Implement distributed tracing for microservices
- Conduct capacity planning and performance analysis
- Drive reliability improvements through data-driven insights
- Implement security best practices: identity management, secrets management, network policies
- Work towards or maintain security certifications (SOC 2, ISO 27001, or similar)
- Conduct security audits and vulnerability remediation
- Implement data governance policies for AI pipelines and user data
- Ensure compliance with data privacy regulations (GDPR, CCPA)
- Write automation scripts and tools in Python for operational tasks
- Build internal tooling for deployments, monitoring, and incident response
- Develop runbooks, automation, and self-healing systems
- Create APIs for infrastructure operations when needed
- Maintain high code quality and testing standards for tooling
- Participate in on-call rotation and lead incident response
- Conduct blameless postmortems and drive action items
- Build and maintain incident response playbooks
- Improve system resilience and failure modes
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×