We are looking for a Cloud Infrastructure Engineer (SDE 2) to join our team. In this role, you will be responsible for the day-to-day management, deployment, and optimization of our cloud environments. You will work under the guidance of senior engineers to implement scalable solutions that ensure our services remain reliable and performant.


About the job:

  • Task Execution: Execute routine infrastructure tasks such as user access management (IAM), environment setup, and resource tagging, etc
  • Implement & Maintain: Deploy and manage scalable cloud infrastructure on AWS and GCP based on established architectural patterns.
  • Support: Assist in the deployment of application updates and infrastructure changes under the supervision of senior engineers.
  • Operational Excellence: Maintain the health of container platforms and cloud services, ensuring high availability and proactive monitoring.
  • Infrastructure as Code: Write and maintain clean, modular code using Terraform or Ansible to automate resource provisioning and configuration.
  • Troubleshooting: Identify and resolve infrastructure performance bottlenecks and system failures in development and production environments.
  • Collaboration: Work alongside application developers to help them deploy workloads efficiently using CI/CD pipelines and containerization.
  • Documentation: Maintain clear documentation for infrastructure setups, SOPs, and incident reports.


About you:

  • Experience: Minimum 2 – 4 years of experience in Cloud Infrastructure, DevOps, or Site Reliability Engineering.
  • Cloud Proficiency: Hands-on experience with AWS (primary) or GCP services (EC2/GCE, S3, VPC, IAM).
  • Containers: Solid understanding of Docker and experience managing workloads in Kubernetes.
  • Automation: Practical experience with Infrastructure as Code (IaC), specifically Terraform, and configuration management tools like Ansible.
  • Linux & Networking: Strong grasp of Linux administration, SSH, TCP/IP, DNS, and load balancing.
  • Observability: Experience using monitoring and logging stacks (e.g., Prometheus, Grafana, ELK, or Datadog).
  • Distributed Systems: Basic experience or familiarity with managing Kafka, Redis, or Elasticsearch.
  • Mindset: A strong "automate everything" mindset and the ability to participate in an on-call rotation.

#LI-RR1