More jobs:
SRE/DevOps Engineer
Job in
Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listed on 2026-06-29
Listing for:
HeadHR
Full Time
position Listed on 2026-06-29
Job specializations:
-
IT/Tech
SRE/Site Reliability, AWS
Job Description & How to Apply Below
Location: Town of Poland
Engagement context
Takeover of production AI mobile coaching platform. Runtimes:
Node.js/NestJS, Python/FastAPI. Data stores:
MongoDB, Postgres, Redis. Infra: AWS (ECS/EKS, RDS, Elasti Cache, S3, VPC, IAM). CI:
Git Hub Actions. Observability:
Datadog. Push:
One Signal. Errors:
Crashlytics. Deep links:
Branch. Vendors:
Auth0, Eleven Labs, OpenAI, Amplitude, Terra, Strava.
Senior SRE/Dev Ops. Owner: CI→production, IaC, deploy automation, observability, on-call, cost control, secrets, security baselines. Phase 1: measure and document. Phase 2: operate and transfer ownership.
First 90 days- Audit CI/CD (Git Hub Actions): duration, flakiness, failure modes, secrets handling
- Audit AWS: ECS/EKS topology, IAM posture, VPC layout, RDS, Elasti Cache, S3
- Audit Datadog: dashboards, tracked metrics, SLO/SLI gaps
- Audit incidents (12m): count, severity, MTTR, RCA patterns
- Vendor inventory:
Auth0, One Signal, Eleven Labs, OpenAI, Branch, Amplitude, Terra, Strava, Crashlytics — owners, billing, MFA, recovery plans
- Own CI/CD across services
- Own AWS infra (Terraform/Pulumi where suitable)
- Cost control (OpenAI token spend, AWS rightsizing)
- Security baselines: least-privilege IAM, secrets rotation, dependency scanning
- Build onboarding for second SRE/Dev Ops hire
- 5+ years SRE / Dev Ops / Platform Engineering in production
- AWS at depth — ECS or EKS, IAM (assume-role patterns, scoped policies), VPC, RDS, Elasti Cache, S3, Cloud Watch
- Infrastructure-as-code — Terraform (preferred) or Pulumi
- Git Hub Actions — building reusable workflows, secret handling, reproducible builds
- Container fundamentals — Docker file authoring, multi-stage builds, image hardening
- Linux operations
- Datadog in production — logs, APM, metrics, dashboards, monitors, SLO/SLI definition
- Incident response — leading or co-leading real production incidents, writing post-mortems
- Observability for both Node.js and Python services
- Secrets management — AWS Secrets Manager, SOPS, or comparable
- Working English
- Cost-optimisation discipline (Fin Ops, AWS Cost Explorer, Reserved Instance planning)
- LLM cost monitoring (per-route OpenAI token spend dashboards)
- Kubernetes specifically (we may or may not be on EKS)
- Security baseline experience — CIS benchmarks, dependency scanning (Snyk, Dependabot), SAST tools
- GDPR / data-residency considerations for cross-border data flows (US PL)
- Mobile CI considerations — Fastlane, app-signing automation, Test Flight / Google Play internal tracks
- On-call playbook authoring
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×