Senior Infrastructure & Security Engineer
Listed on 2026-02-19
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, SRE/Site Reliability
SENIOR INFRASTRUCTURE & SECURITY ENGINEER
Headquarters: Washington, DC
URL: (Use the "Apply for this Job" box below)..app
Dev Ops | Site Reliability | Cloud Security
The OpportunityWe process millions of SMS and MMS messages daily across a distributed platform built on Google Cloud — Cloud Run microservices, Pub/Sub event pipelines, Spanner databases, and Memory store for Redis. Our infrastructure auto-scales aggressively to meet campaign demand, our data pipelines handle real-time delivery tracking at high velocity, and our systems must be fast, secure, and reliable around the clock.
We’re looking for a Senior Infrastructure & Security Engineer to own the reliability, security, and operational maturity of this platform. You’ll be the first dedicated infrastructure hire, working directly with the CTO to shape the technical foundation as we scale. This isn’t a role where you’ll maintain someone else’s runbooks — you’ll define the roadmap, make architectural decisions, and build the systems that keep our platform running and our customers’ data safe.
WhatYou’ll Own Infrastructure as Code & Cloud Architecture
Own and evolve our Terraform-managed GCP infrastructure spanning a Shared VPC host project and multiple service projects. Design for cost efficiency, resilience, and scalability across Cloud Run, Spanner, Pub/Sub, Cloud Storage, Memory store for Redis, and Cloud Tasks. You’ll manage environment promotion across dev, staging, and production.
Reliability & ObservabilityBuild comprehensive monitoring, alerting, and incident response capabilities using Cloud Monitoring, Cloud Logging, and Cloud Trace. Establish SLIs and SLOs for critical message delivery paths. Reduce mean time to detection and recovery. Design health checks and auto-healing patterns for Cloud Run services processing millions of daily messages.
Cloud SecurityHarden our platform across network, application, and data layers. This includes VPC firewall rules and network policies, IAM role design and service account management, secrets management via Secret Manager, Cloud Armor policies for DDoS and rate limiting, API Gateway security configurations, and dependency scanning. Lead security reviews and own incident response for security events.
CI/CD & Developer ExperienceMaintain and improve our Git Hub Actions-based deployment pipelines for a Type Script monorepo deploying to Cloud Run. Ensure the engineering team can ship safely and quickly with automated testing, linting, container builds, and environment-specific deployments. Optimize build times and deployment reliability.
Performance & Auto-ScalingTune Cloud Run autoscaling policies including min/max instances and concurrency settings for both public-facing API services and private Pub/Sub processing workers. Optimize Spanner query performance and node allocation. Ensure our distributed rate-limiting infrastructure using Redis handles coordination across horizontally scaling instances with sub-millisecond overhead.
Compliance & Data ProtectionHelp establish and maintain compliance practices relevant to messaging platforms, including TCPA requirements, carrier-specific policies, data retention and encryption standards, and audit logging. Ensure our platform meets the security and data handling expectations of enterprise customers.
What We’re Looking For Required- 5+ years in infrastructure, Dev Ops, or SRE roles with increasing scope and ownership
- Deep Google Cloud Platform experience, specifically with Cloud Run, VPC networking, IAM, and at least one managed database service
- Strong Terraform skills in production — you’ve authored and maintained multi-environment, modular Terraform codebases, not just run applies
- Hands-on cloud security experience: network security design (firewall rules, private networking, VPC peering), IAM policy architecture, secrets management, and vulnerability assessment
- Git Hub Actions proficiency — you’ve built and maintained CI/CD pipelines for containerized applications deploying to cloud infrastructure
- Experience operating distributed systems that process high message or event volumes with strict latency and reliability requirements
- Strong Linux fundamentals, networking knowledge (DNS, TLS,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).