More jobs:
Senior Software Engineer; DevOps
Job in
Boston, Suffolk County, Massachusetts, 02298, USA
Listed on 2026-06-20
Listing for:
Hi Marley
Full Time
position Listed on 2026-06-20
Job specializations:
-
IT/Tech
Cloud Computing: Infrastructure & Operations, SRE/Site Reliability, IT Infrastructure, Data Engineering
Job Description & How to Apply Below
Requirements
- 6-+ years of Dev Ops/SRE/Platform Engineering experience
- 2+ years of experience building or operating AI/ML infrastructure (model serving, inference, LLM orchestration, or agentic systems)
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience
- You have built and operated infrastructure fortraditional andAIorML workloads at a SaaS company
- You naturally step up to lead technical conversations, and people across teams seek you out when infrastructure decisions get complicated
- You have deep experience with AWS cloud services (ECS, Lambda, Sage Maker, Bedrock, S3, DynamoDB, Redshift, or equivalent)
- You have strong infrastructure-as-code skills with Terraform and understand how to manage state, modules, and multi-environment configurations
- You understand data infrastructure: pipelines, warehousing, ETL/ELT, and how to support analytics at scale
- You think about observability as more than dashboards — you care about data integrity, SLOs, error budgets, and catching silent failures
- You have experience with compliance-sensitive environments and understand why audit trails, access governance, and change management matter
- You are comfortable operating in a fast-moving environment where AI capabilities are evolvingrapidly and infrastructure decisions have regulatory implications
- You communicate well with both engineering and non-technical stakeholders
- Track recordof leading cross-team technical initiatives and mentoring engineers on infrastructure and operational best practices
- Strongproficiencyin at least one programming language (Python, Go, Type Script, or similar)
- Experience with:
- Container orchestration (ECS, EKS)
- Monitoring and observability platforms (Datadog, Cloud Watch)
- Data infrastructure (Redshift, or similar data warehousing; Airflow,dbt,Dagsteror similar pipeline tools) is a strong plus
- Experience in regulated industries (insurance, financial services, healthcare) is a strong plus
- A genuine curiosity about AI and emerging technologies, paired with the judgment to apply them thoughtfully and responsibly
- We are looking for a Sr. Dev Ops Engineer II to help us build and scale the infrastructure that powers both our core platform and our rapidly growing agentic AI services
- You will be at the intersection of cloud infrastructure, AI operations, and platform engineering — building the foundation that enables Hi Marley to operate reliably at enterprise scale while deploying autonomous AI agents in regulated insurance workflows
- You'll also be expected to raise the bar for the teams around you — setting infrastructure standards, driving technical decisions in ambiguous situations, and helping less experienced engineers grow their operational instincts
- Design and operate cloud infrastructure on AWS that supports both our core SaaS platform and our agentic AI services, ensuring reliability, scalability, and cost efficiency
- Build and maintain AI/ML infrastructure and monitoring for LLM-powered agentic services
- Establish and enforce infrastructure-as-code standards using Terraform, defining the patterns other engineers follow for environment parity, drift detection, and automated compliance validation
- Implement observability beyond availability — data integrity monitoring, SLO frameworks with error budgets, and automated regression detection for both platform and AI services
- Build deployment automation including pre-deployment verification, migration script validation, and codified rollback procedures to eliminate human-memory dependencies
- Support big data infrastructure: data pipelines, warehousing (Redshift), and analytics tooling that enables reporting, BI, and AI training workflows
- Implement security and compliance controls for AI workloads operating in regulated carrier environments — including audit logging, access governance, and configuration management
- Drive environment parity across all infrastructure with automated drift detection and remediation
- Improve disaster recovery capabilities: documented and rehearsed DR procedures, defined RTO/RPO by service tier, and tested recovery runbooks
- Lead architecture reviews for new services, integrations, and AI agent deployments — partnering with engineering, product, and security to ensure infrastructure decisions are sound before they ship
- Innovate on developer experience: reduce friction in testing environments, CI/CD pipelines, and local development workflows
- Act as a technical anchor for infrastructure decisions across teams — providing clarity when requirements are ambiguous and helping the organization converge on consistent, scalable approaches
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×