DevOps Engineer

0



We are seeking a Tech Lead – Site Reliability Engineering with expertise in DevOps, QA, and Cloud to lead reliability, automation, and performance engineering efforts across cloud-based systems. This role involves leading teams, establishing SRE best practices, and implementing scalable cloud architectures to ensure high availability, security, and efficiency.

Responsibilities

SRE & Cloud Reliability Engineering:

  • Design and implement highly available, scalable cloud architectures. o Ensure uptime and system reliability through proactive monitoring and incident management.
  • Automate infrastructure provisioning and scaling using Terraform, Ansible, Kubernetes, and Helm.

DevOps & Automation:

  • Develop and maintain CI/CD pipelines for automated build, test, and deployment. o Implement GitOps workflows to streamline deployment processes.
  • Optimize performance and cost-efficiency of cloud environments.

QA & Test Automation:

  • Lead automated testing strategies for API reliability, performance, and security. o Implement Test-Driven Development (TDD) and Continuous Testing methodologies.
  • Perform load testing, stress testing, and resilience testing to prevent failures.

Observability, Monitoring & Incident Response

  • Set up monitoring and alerting dashboards using Prometheus, Grafana, Splunk, Datadog. Implement log aggregation and distributed tracing for deep observability.
  • Lead incident response, root cause analysis, and post-mortem analysis.

Security & Compliance:

  • Enforce cloud security best practices (IAM policies, Zero Trust, cloud encryption).
  • Ensure compliance with regulatory standards (SOC 2, ISO 27001, GDPR, HIPAA).
  • Implement threat detection and anomaly detection using AI-driven monitoring tools.

Leadership & SRE Strategy:

  • Lead and mentor SRE engineers, ensuring adoption of best practices.
  • Collaborate with DevOps, Security, and Cloud teams to implement scalable and secure cloud infrastructures.
  • Establish SRE operational strategies, playbooks, and incident management frameworks.

Qualifications

  • Bachelor’s/Master’s degree in Computer Science, IT, or a related field.
  • 7+ years of experience in Site Reliability Engineering, DevOps, and Cloud infrastructure.
  • Deep expertise in SRE methodologies, cloud reliability, and distributed systems.
  • Proficiency in container orchestration, cloud automation, and infrastructure as code.
  • Strong knowledge of observability, monitoring, and AI-powered performance tuning.
  • Experience with security compliance, disaster recovery, and failure prevention.
  • Proficiency in scripting and automation (Python, Bash, Go, YAML).
  • Proven experience in leading SRE teams, defining strategies, and implementing best practices.

Must-Have Skills:

  • Expertise in Site Reliability Engineering (SRE), DevOps, and Cloud Automation.
  • Experience in cloud computing platforms (AWS, GCP, Azure, OpenStack).
  • Proficiency in Infrastructure as Code (Terraform, Ansible, CloudFormation).
  • Hands-on experience with Kubernetes, Docker, and OpenShift.
  • Deep understanding of observability, monitoring, and logging (Prometheus, Grafana, ELK, Splunk, Datadog).
  • Experience in CI/CD pipeline automation (Jenkins, GitHub Actions, ArgoCD, Tekton).
  • Expertise in performance tuning, cloud scaling, and system optimization.
  • Proficiency in QA automation, API reliability testing, and microservices validation.
  • Strong background in networking, traffic routing, and load balancing.
  • Experience in incident response, disaster recovery, and reliability planning.
  • Leadership experience in mentoring, managing teams, and driving SRE best practices.

Preferred Skills: • Experience with Chaos Engineering and fault injection frameworks (Gremlin, LitmusChaos). • Knowledge of cloud cost optimization strategies and FinOps. • Understanding of Zero Trust Security, IAM policies, and cloud-native security practices. • Proficiency in scripting and automation (Python, Bash, Go). • Familiarity with ML-based anomaly detection for reliability monitoring

You have to wait 20 seconds

Generating Apply Link...

Post a Comment

0 Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
Post a Comment (0)
Our website uses cookies to enhance your experience. Learn More
Accept !