- Company Name
- Resonaite
- Job Title
- Site Reliability Engineer (DevOps/Release)
- Job Description
-
**Job title**
Site Reliability Engineer (DevOps/Release)
**Role Summary**
Ensure high availability, performance, and resilience of critical AWS and on‑prem workloads. Lead stability engineering, automation, and observability initiatives, support cloud migrations, and provide L2 production support with a 24/7 on‑call rotation.
**Expactations**
- Deliver measurable improvements in service stability and deployment efficiency.
- Maintain compliance with SLA and operational best practices.
- Act as a trusted advisor for security, vulnerability remediation, and technology lifecycle.
- Collaborate cross‑functionally with DevSecOps, Architecture, and Agile squads.
**Key Responsibilities**
- Drive service stability, automation, and optimization initiatives.
- Validate readiness and support AWS cloud migration, including Day‑2 runbook automation.
- Operationalize cloud‑native and migrated applications via automated deployment, monitoring, and recovery pipelines.
- Design scalable solutions to reduce manual intervention, enable self‑healing, and auto‑scaling.
- Utilize Splunk, Dynatrace, Grafana to optimize performance, implement anomaly detection, and address production issues proactively.
- Analyze testing and production trends, conduct root‑cause investigations, and recommend corrective actions to Agile squads.
- Create and maintain runbooks, SOPs, post‑mortems, and architecture documentation.
- Lead vulnerability remediation and security alignment efforts.
- Participate in a 24/7 rotating on‑call schedule providing L2 incident response.
**Required Skills**
- AppOps experience with highly resilient, high‑performance workloads on AWS.
- Proficiency in Git, PowerShell, Python, Ansible, Terraform, Docker, and microservices patterns.
- Hands‑on experience with Splunk, Dynatrace, Grafana, and ServiceNow.
- Deep understanding of AWS ECS, autoscaling, load balancing, and VPC integration.
- Knowledge of Agile, SDLC, release management, and incident/problem/change management.
- Strong analytical skills for early risk identification and mitigation.
- Excellent communication and stakeholder collaboration.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
- Relevant AWS certifications (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect).
- ITIL certification preferred.
- SRE certification a plus.