- Company Name
- SpryPoint
- Job Title
- Cloud Operations Engineer I
- Job Description
-
Job title: Cloud Operations Engineer I
Role Summary:
Support and maintain AWS-based infrastructure for a growing utility software platform, ensuring environments are stable, secure and performance‑optimized during implementation, testing and production phases. Leverage automation, observability tools and AI-driven troubleshooting to resolve incidents and continuously improve operational processes.
Expectations:
- Rapidly learn and apply runbooks while proactively suggesting enhancements.
- Manage multiple concurrent requests calmly and efficiently, prioritizing urgency and impact.
- Communicate clearly across internal and client‑facing teams.
- Use AI tools for faster diagnostics, documentation and workflow automation.
Key Responsibilities:
- Provision, update and decommission AWS environments (Elastic Beanstalk, EC2, ECS, RDS PostgreSQL & Aurora Serverless v2, DynamoDB, Route 53, VPC, S3).
- Perform IP/domain whitelisting, access controls, database refresh coordination and general troubleshooting via Jira Service Management.
- Investigate performance and reliability issues using logs, metrics, CloudWatch, Linux‐level debugging and observability platforms; escale SQL/indexing concerns to developers.
- Support onboarding, testing, mock and production go‑lives for project teams.
- Execute scheduled maintenance (patching, scaling, certificate updates, configuration adjustments).
- Tune monitoring and alerting to detect incidents early.
- Conduct incident analysis, root cause investigations and post‑mortem documentation.
- Develop and maintain automation scripts with Python or Bash to streamline recurring tasks.
- Analyze NGINX, service and application logs to isolate stack issues.
- Document procedures, runbooks and environment guidelines in Confluence; record change management and time tracking in Jira.
- Enforce security best practices: IAM permissions, patching, access controls, compliance checks.
- Validate backups, perform restores and participate in disaster recovery drills.
- Use AI tools to accelerate troubleshooting and improve operational documentation.
Required Skills:
- Hands‑on experience with AWS services: EC2, Elastic Beanstalk, ECS, RDS (PostgreSQL), Aurora Serverless v2, DynamoDB, Route 53, VPC, S3, IAM.
- Linux system administration and shell scripting (Bash).
- Python scripting for automation.
- Familiarity with observability/monitoring tools (e.g., CloudWatch, Datadog, New Relic).
- Logging and diagnostics (NGINX logs, application logs).
- Experience using Jira Service Management and Confluence for change tracking and documentation.
- Strong analytical skills; ability to diagnose performance bottlenecks and troubleshoot complex distributed systems.
- Knowledge of database performance tuning, SQL and indexing.
- Awareness of security and compliance best practices for cloud infrastructure.
- Proficiency in using AI‑powered tools (e.g., Copilot, ChatGPT, generative assistants) for troubleshooting and automation.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent work experience (minimum 2–3 years in cloud operations or DevOps).
- Relevant AWS certification preferred (e.g., AWS Certified SysOps Administrator – Associate, AWS Certified Developer – Associate, or AWS Certified Solutions Architect – Associate).