Senior Infrastructure & Reliability Engineer
Job in
Melville, Suffolk County, New York, 11747, USA
Listed on 2026-06-01
Listing for:
Kliger-Weiss Infosystems, Inc.
Full Time
position Listed on 2026-06-01
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations, IT Support
Job Description & How to Apply Below
Department: Dev Ops
Employment Type: Full Time
Location: Melville, NY
Compensation: $180,000 / year
Description
The Opportunity
If innovation lives in your DNA and AI is already part of how you think, build, and operate - you're going to love what we're doing 'll join a small, senior team with a real mandate to design, build, and run the systems that power retail move fast, we automate aggressively, and we expect every engineer to multiply their impact with modern tooling.
Your fingerprints will be on the platform every day.
The Company
We are a small team with a big vision: to be the premier provider of cloud technology solutions for retailers. KWI offers a complete, unified commerce solution from a single database, specifically designed to help specialty retailers grow their business. Our portfolio of customers includes Pandora, Bluemercury, Tom Ford and many other globally recognizable brands.
We combine Point of Sale, Merchandising, Order Management, eCommerce, CRM, and Loss Prevention into one cloud-based platform. We are a Values and Mission driven organization, and we believe that if we develop and demonstrate leadership in our strategy, operations, and people, we will continue to drive product innovation and service excellence.
The impact you'll make
- Support and operate our Linux/UNIX systems, VMware infrastructure, CI/CD pipelines, MySQL databases, and containerized workloads that serve our retail clients 24×7.
- Own incidents end-to-end: triage alerts, drive root-cause analysis across the application, database, and network layers, and write the post-incident docs that stop recurrence.
- Tune and operate MySQL at production scale: query analysis, replication topology, backup and recovery, and schema changes against live workloads.
- Containerize and template services using Docker and infrastructure-as-code patterns to make deployments repeatable, declarative, and boring.
- Improve observability across the fleet - metrics, logs, traces, and dashboards - so problems are seen before customers feel them.
- Use modern AI-augmented engineering tools (Claude Code, MCP-based workflows, agentic automation) as a daily multiplier - to operate faster and extend what one engineer can deliver.
- Document and mentor. Runbooks, design docs, and onboarding material aren't an afterthought here - they're how the team scales.
- 5+ years operating production Linux/UNIX (RHEL, CentOS/Rocky, Debian/Ubuntu) at meaningful scale.
- Strong MySQL operational experience - replication, performance tuning, backups, recovery, and schema migrations.
- Hands-on VMware/vSphere experience in production environments.
- Java application-tier troubleshooting experience - comfortable reading thread dumps, GC logs, and heap behavior.
- Solid Dev Ops fundamentals:
Git, CI/CD pipelines, Ansible (or similar configuration management), Terraform (or similar IaC), and Docker. - Networking literacy: TCP/IP, DNS, TLS, HTTP/S, load balancing, basic firewalling. You can read a tcpdump and a cert chain.
- Comfortable scripting in Bash. Python is not required, but you should have a working understanding of programming fundamentals and be able to read, modify, and write straightforward code.
- Strong troubleshooting instincts and the temperament lead under pressure.
- Real day-to-day experience using AI-augmented engineering tools (Claude, Cursor, Copilot, MCP servers, agentic workflows) - not just demos.
- Experience with Datadog or comparable observability platforms.
- Full Medical, Dental and Vision
- Annual bonus eligible
- Free gym in the building
- Generous PTO policy
- Summer Fridays....all year round
- Tuition Reimbursement
- Discount from building café
- 401(K) with a 50% company match (up to 6% of employee contribution)
- Employee Referral Program
- (1) Volunteer day each year
We understand that our teams need flexibility, which is why we follow a hybrid schedule. Our in-office days of Monday, Tuesday and Thursday, and employees are allowed to work remotely on Wednesdays and Friday.
We are also a collaborative group and believe that getting together in person allows our team to do their best…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×