- Company Name
- OpenText
- Job Title
- Lead Site Reliability Administrator
- Job Description
-
**Job Title**
Lead Site Reliability Administrator
**Role Summary**
Lead the design, implementation, and maintenance of high‑availability, high‑performance cloud‑based services. Automate repetitive operational tasks, enforce proactive monitoring and alerting, and drive incident resolution and continuous improvement in a 24/7 support environment.
**Expactations**
- Deliver on agreed Service Level Agreements (SLAs).
- Participate in on‑call rotation and shift work as required.
- Own incident lifecycle, including RCA and SWAT investigations.
- Provide actionable feedback to development teams on stability and performance.
**Key Responsibilities**
- Build and maintain automated monitoring, alerting, and logging pipelines.
- Develop runbooks, patterns, and best‑practice guides for production operations.
- Collaborate with development and IT business partners to define and surface key KPIs.
- Plan and validate changes from infrastructure and development teams.
- Provide real‑time, advanced technical support and troubleshooting for user/customer issues.
- Drive transition of new capabilities to sustain activities.
**Required Skills**
- Proficiency in Linux administration and scripting (Shell, Python, Perl, JavaScript).
- Hands‑on experience with GCP, AWS, Azure, Kubernetes, Cloud Foundry, BOSH.
- Containerization expertise (Docker, rkt, Mesos) and microservices/RESTful architectures.
- Continuous Delivery tools (GitOps, Ansible, Rundeck, Argo CD).
- Middleware and Java‑based stack support (Apache, Tomcat, Spring, Struts, Spark).
- Database skills – RDBMS (Oracle, PostgreSQL, MariaDB) and NoSQL (Cassandra).
- Monitoring/observability – New Relic, Dynatrace, AppDynamics, Zabbix, check_mk, Graylog, Kibana.
- Messaging/search – Kafka, RabbitMQ, Solr, Elasticsearch.
- Strong troubleshooting, security best practices, and ITIL principles.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- Certifications in cloud platforms (AWS/Azure/GCP) and container orchestration (Kubernetes).
- DevOps or SRE‑specific certifications preferred (e.g., Certified Kubernetes Administrator, AWS Certified DevOps Engineer).