Saquib Zeya

DevOps Engineer

Patna, Bihar, India6 yrs 7 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Over 6 years of experience in Site Reliability Engineering.
Expert in managing Kubernetes deployments across multiple cloud platforms.
Proven ability to optimize big data technologies for performance.

Stackforce AI infers this person is a Cloud Infrastructure and Site Reliability Engineering expert in the SaaS industry.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud InfrastructureData AnalysisInfrastructure ManagementDevops

Other Skills

API DevelopmentAWSAccern No-Code AIAmazon S3Apache KafkaApache SparkAutomationAutomation FrameworksAzureBashCICI/CDCloud SecurityComputer ScienceDebugging Code

About

Experienced Site Reliability Engineer | 6+ Years in Building Scalable, Reliable Systems and Optimizing Infrastructure

Experience

6 yrs 7 mos

Total Experience

2 yrs 2 mos

Average Tenure

Current Experience

Wand ai

Senior DevOps Engineer

Feb 2025 – Present · 1 yr 4 mos · Patna, Bihar, India · Remote

Managing Kubernetes deployments across AWS and Azure environments.
Specializing in optimizing, scaling, and monitoring big data technologies such as Kafka, Elasticsearch, Spark, and PostgreSQL.
Enhancing cloud security and implementing best practices.
Automating and optimizing operational processes to improve efficiency.

KubernetesAWSAzureKafkaElasticsearchSpark+5

Accern (acquired by wand ai)

3 roles

Senior Site Reliability Engineer

Promoted

Oct 2024 – Feb 2025 · 4 mos · Remote

Defined, managed, and analyzed SLIs and SLOs to measure system performance and availability effectively.
Leveraged data analysis and statistical methods to identify performance trends, detect anomalies, and drive proactive optimizations.
Designed and implemented automation frameworks to reduce manual effort and improve efficiency.
Implemented and optimized monitoring, alerting, and logging tools to proactively mitigate issues and recommend improvements.
Partnered with internal teams to diagnose and resolve incidents, ensuring system reliability.
Collaborated with product engineering teams to promote and implement scalable, resilient system designs.

SLIsSLOsData AnalysisAutomation FrameworksMonitoring ToolsIncident Management+1

Site Reliability Engineer

Promoted

Feb 2022 – Sep 2024 · 2 yrs 7 mos · Remote

Manage and build a versatile tech stack, including Kubernetes deployments across AWS and AZURE environments.
Specialize in optimizing, scaling, and monitoring big data technologies such as Kafka, Elasticsearch, Spark, and PostgreSQL.
Enhance cloud security by implementing best practices and security measures.
Test and improve system integrity, application development processes, and other infrastructure-related components.
Leverage open-source technologies and tools, including CI/CD pipelines and version control systems like Git, to streamline development and deployment workflows.
Automate and optimize operational processes to improve efficiency and reduce manual intervention.
Oversee code deployments, fixes, and updates, and manage the overall release process.
Respond to system alerts and provide on-call support to ensure high availability and rapid resolution of incidents.
Estimate, plan, and execute on various projects, features, and integrations, ensuring alignment with business goals and timelines.
Stay current with industry trends and continuously seek new technologies and methods to improve system performance and reliability.

KubernetesAWSAzureKafkaElasticsearchSpark+6

Senior Application Support Engineer

Sep 2021 – Feb 2022 · 5 mos · Remote

Ocrolus

Application Support Engineer

May 2019 – Aug 2021 · 2 yrs 3 mos · Gurgaon, Haryana, India · On-site

Infrastructure Management: Managed infrastructure and applications like CURA, ensuring seamless operations and high availability.
Kubernetes Deployment: Deployed AWS (EKS) and GCP (GKE) Kubernetes clusters, provisioning infrastructure using Terraform for efficient resource management.
Automation: Automated workflows with Jenkins, streamlining CI/CD processes and improving deployment efficiency.
Helm Charts: Developed and deployed Helm charts for applications, facilitating easy deployment and management in Kubernetes.
Monitoring & Logging: Monitored systems and logs using Kibana, CloudWatch, New Relic, and RDS Performance Insights, ensuring timely issue detection and resolution.
Issue Resolution: Resolved critical issues using Docker, Kubernetes, and Elastic Beanstalk, maintaining system stability through debugging and service restarts.
API Development: Built and deployed RESTful APIs with Python (Flask, FastAPI), testing with Postman and Insomnia to ensure functionality.
Incident Management: Managed incidents via JIRA and PagerDuty, ensuring prompt resolution and service uptime.
Version Control: Made code changes, managed deployments in Bitbucket, and pushed updates to production environments.
Kubernetes Monitoring: Monitored and scaled Kubernetes clusters with tools like Weave, k9s, and kubectl, ensuring efficient resource utilization.
Redash Dashboards: Configured Redash for database queries and created dashboards to meet business requirements, effectively managing downtime.