Senior Site Reliability Engineer Job Westbrook area,Maine USA,IT/Tech

We are looking for a Senior Site Reliability Engineer (SRE) to join our Site Reliability Engineering Team, working closely with a dedicated product team to modernize infrastructure, strengthen system resilience, and scale our global platform, leveraging AI tools and agents to accelerate delivery and improve system quality.

Are you passionate about building modern, scalable cloud platforms that power real-world impact? At IDEXX, we are transforming how laboratory systems operate globally, helping veterinarians deliver better outcomes and enabling pets to live fuller lives. Our software supports Reference Laboratories, a critical area of IDEXX's business, enabling high-volume diagnostic workflows, operational efficiency, and clinical insight at scale.

In this role

You will be responsible for Dev Ops & Platform Engineering:

Own the design and evolution of CI/CD pipeline architecture, governance, and standards
Modernize and automate deployment pipelines for Kotlin-based AWS Lambda services using Git Hub Actions

You will standardize infrastructure and deployment processes across services:

Reduce manual deployment effort through automation
Leverage AI tools where appropriate to improve productivity and system quality
Cloud Infrastructure & Reliability

You will design, build, and evolve scalable, resilient AWS cloud infrastructure:

Lead implementation of disaster recovery, high availability, and fault‑tolerant designs
Automate infrastructure provisioning and lifecycle management to reduce manual work

You will be responsible for Monitoring, Observability & Production Support:

Build and maintain end‑to‑end observability (metrics, logging, tracing, alerting)
Establish effective alerting that reduces noise and ensures high‑signal incident detection
Proactively identify and address system risks before they impact customers
Lead incident response in shared on‑call rotation (triage, mitigation, communication)
Drive root cause analysis and blameless postmortems to prevent recurrence

You will be responsible for Release Engineering:

Own and govern the release process, including deployment gates and approvals
Review and approve deployment plans to ensure quality and stability
Optimize the build and release lifecycle for speed, consistency, and reliability
Manage cross‑repo dependencies and versioning strategies

You will be responsible for Security & Compliance:

Lead remediation of security vulnerabilities, collaborating with the Security team as needed
Establish processes to proactively prevent new security risks
Embed secure development and deployment practices into pipelines

You will collaborate and be responsible for mentoring and guiding the team:

Guide the development team toward reliability and security best practices
Proactively identify issues, drive visibility, and ensure timely resolution with engineers
Stay up to date with industry trends and emerging technologies to drive innovation
Communicate technical concepts clearly to both technical and non‑technical stakeholders

What you will need to succeed

7+ years of experience in Dev Ops, SRE, Platform Engineering, or similar roles focused on CI/CD, cloud infrastructure, and system reliability
Strong experience with AWS Serverless architectures, Terraform and Cloud Formation, CI/CD pipelines (Git Hub Actions preferred), Azure Entra , OAuth2, OpenID Connect, Maven build tooling, Git-based version control workflows (Git Hub preferred)
Proven ability to design and optimize deployment pipelines
Troubleshoot complex distributed systems
Make data‑driven decisions
Translate business requirements into scalable technical solutions
Strong communication, collaboration, and organizational skills
Understanding of system design patterns for reliability and scalability

Nice to have

Experience with Kotlin or Java development
Experience with No

SQL databases (for example DynamoDB) and relational databases (for example PostgreSQL)
Experience working in Agile or Scrum environments
Familiarity with artifact management tools such as JFrog Artifactory
Experience defining and managing SLAs, SLOs, and SLIs
Experience with distributed tracing tools such as AWS X‑Ray or Open Telemetry
Experience using AI tools or AI agents…