Site Reliability Engineer- onsite , NY
Listed on 2025-12-11
-
IT/Tech
Systems Engineer, Cloud Computing, Network Engineer, SRE/Site Reliability
Location: New York
Overview
Site Reliability Engineer — We’re looking for a Site Reliability Engineer to own the design and operation of a hybrid infrastructure spanning both cloud and on-premises environments. You’ll work directly with large fleets of IoT devices, build and improve deployment and monitoring tooling, and lead modernization efforts from architecture to production. This is a hands-on role where you’ll partner closely with engineering teams to deliver secure, reliable, and scalable systems while mentoring others and enforcing strong infrastructure best practices.
The company is located in New York, NY and will be 5 days onsite a week.
What You Will Be Doing- Design, build, and operate infrastructure across cloud and on-prem environments with a focus on reliability, security, and scalability.
- Develop and maintain tools to deploy, manage, and monitor customer-premise equipment, IoT devices, and backend services.
- Lead system architecture and modernization initiatives from design through implementation and ongoing production support.
- Improve observability for infrastructure and IoT fleets by building custom tooling and integrating off-the-shelf monitoring solutions.
- Implement and harden secure connectivity patterns to ensure resilient, safe device-to-cloud communication.
- Mentor teammates and raise the bar on infrastructure, monitoring, and system design practices.
- Collaborate with Dev Ops and software teams to tightly integrate infrastructure with CI/CD pipelines for smooth, reliable releases and upgrades.
- 5+ years of experience in Infrastructure Engineering, SRE, or Dev Ops roles.
- Bachelor’s degree in Computer Science or equivalent experience.
- Proven experience managing servers and/or IoT devices at scale.
- Strong software development skills for infrastructure tooling, preferably in Python.
- Hands-on experience with core AWS services.
- Demonstrated ability to build, manage, and secure hybrid (cloud + on-prem) environments.
- Solid understanding of Infrastructure as Code (Terraform, Ansible, CDK, etc.) for repeatable automation.
- Strong Linux and networking fundamentals, including Docker, firewalls, load balancers, and VPNs.
- Deep experience with monitoring, alerting, and troubleshooting large-scale distributed systems.
- Strong problem-solving skills and comfort working in fast-paced, production-critical environments.
- Excellent communication skills in English, with a collaborative and proactive working style.
- Experience with AWS IoT services, such as Green grass v2.
- Knowledge of secure infrastructure design principles and best practices.
- Experience managing edge devices at scale.
Applicants must be currently authorized to work in the United States on a full-time basis now and in the future. This position does not provide sponsorship.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).