Shailendra Kumar

Co-Founder

Noida, Uttar Pradesh, India15 yrs 1 mo experience
Highly Stable

Key Highlights

  • Expert in chaos engineering practices.
  • Proficient in cloud infrastructure management.
  • Strong background in automation and scripting.
Stackforce AI infers this person is a DevOps/SRE expert with a focus on cloud infrastructure and automation.

Contact

Skills

Core Skills

AutomationCloud ComputingHadoop AdministrationData ManagementSupport EngineeringSoftware Development

Other Skills

ANTAWSAmazon EKSAmazon S3AnsibleBashCloud Computing IaaSDatabase connectivityDatabasesDjangoDockerFastAPIFlaskGitGrafana

About

Seasoned Senior level DevOps/SRE Professional who favors challenging situations to utilize problem-solving technical skills to contribute to organizational goals and values. Currently helping team adopt chaos engineering practices. Specialities: Virtualization and Cloud: - AWS - EC2, RDS, EBS etc - OpenStack (Opensource and Mirantis ) - cinder, swift, nova, neutron, horizon, glance, Sahara(Elastic Map Reduce) - KVM, Xen, Vagrant and Oracle VirtualBox - Docker, Containers, coreOS, fleet, etcd, Kubernetes Deployment/Config Management Tools: - Ansible, Salt Version Control: git Programming Languages and Scripting: - Java, Groovy, Shell, Python, and Perl Storage: - HP 2000 - GlusterFS, LVM Operating Systems: - Centos6.X, centos7.x, and Ubuntu Administration: - Linux, storage and Hadoop administration(vanilla and CDH) CI & CD: - Jenkins, cloudbees CD/RO Monitoring: - Zabbix, Nagios, and Grafana, Prometheus and alrt manager, newrelic, splunk Ticketing: - Remedy, ITSM, Jira, Salesforce, Change Management ePortFolio: https://eportfolio.greatlearning.in/shailendra-kumar I have gained expertise in designing, building, and maintaining large scale, performant, and resilient systems and infrastructure. I have worked on a wide array of projects including deprecation of legacy systems, migrating to new architectures and creating new architectures from ground. In my current role, I am also responsible for close communications between technology, product & business teams, sprint planning & retrospective, doing code reviews, building delivery plans, mentoring engineers and participating in hiring process. can be reached at emailtoshailendra@gmail.com

Experience

15 yrs 1 mo
Total Experience
3 yrs
Average Tenure
3 yrs 1 mo
Current Experience

Reliability system llp

Co-Founder

May 2023Present · 3 yrs 1 mo · Noida, Uttar Pradesh, India · On-site

  • Led the growth and expansion of Reliability System LLP through strategic planning, sales, and marketing efforts.
  • Developed and implemented innovative marketing strategies to increase brand visibility.
  • Successfully scaled the company by hiring and training new talent to support business objectives.

Adobe

2 roles

Computer Scientist II (SDE IV)/SRE

Aug 2022May 2023 · 9 mos

Computer Scientist (SDE III)

Sep 2017Aug 2022 · 4 yrs 11 mos

  • Writing scripts in Shell and Python for Automation
  • Writing terraform templates & modules to create and manage resources in AWS
  • Working on configuration management tools like Saltstack
  • Managing several Microservices for backend (onboarding, Creating pipelines, implementation of monitoring etc.)
  • Leading Chaos engineering efforts across teams
  • Implementing Cost optimization measures for AWS resources
  • Deploying and managing services in Kubernetes
  • Managing compliance and security of application and infra
  • Preparing, updating and testing Disaster Recovery Procedures
  • Conducting Daily standups, Sprint planning and review meeting of multiple scrum teams.
  • Onboarding and decommissioning of projects.
ShellPythonTerraformSaltstackKubernetesAWS+2

Guavus

Lead Engineer

Sep 2014Sep 2017 · 3 yrs · Gurgaon, Haryana, India

  • Deployment of Guavus Analytics on Tier1/Tier2 ISP in US.
  • Onsite critical upgrade support from hadoop to hadoop yarn.
  • Hadoop Software installation and Port configuration
  • Configure Name Node High Availability
  • Hadoop Cluster Software patching and upgrades.
  • Database connectivity for the Hadoop Cluster
  • HDFS management and monitoring.
  • HDFS support and maintenance.
  • Cluster maintenance including creation, addition and removal of data or name nodes
  • Manage and review Hadoop/Oozie log files.
  • Configuration,Running,Troubleshooting Oozie MR Jobs.
  • Module wise data validation from raw flow received, data dropped,annotated data and data showing up on UI.
  • Performance testing in staging environment to verify the system performance, UI performance and Modulewise functionality.
  • Modularise Data integrity and Data Validation practices.
  • UAT with customers
  • Interacting with sales, solutions Architect, customer from/during E2E deployment activities.
  • Work closely with Product Development, Product Manager and other stake holders to collaborate on Bugs and issues which require deep expertise.
  • Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
  • Customer facing role, interacting with customer and resolving issues raised by customer.
  • Writing MOPs, SOPs and DRPs.
  • Apply in-depth analysis of hadoop based workload, project-based work, design solutions to issues, and evaluate their effectiveness
HadoopOozieDatabase connectivityHDFS managementHadoop AdministrationData Management

Amazon

Support Engineer III

Aug 2013Sep 2014 · 1 yr 1 mo · Hyderabad Area, India

  • I write unix shell scripts and do root-cause analysis for business critical issues and implement solutions for same.
  • I work in following technologies..
  • 1. Unix Shell Scripts , Perl , SQL
  • 2. Java , JSP and Web services
  • 3. Cloud computing
  • Version control System : git , perforce
  • Build Toll : ANT
  • Development Tool : Eclipse
  • Role:
  • I am a part of Fulfilment center software team which is responsible for handling different services for Amazon warehouses. These services are mission critical services as any issue with these services can cause huge impact on customer orders hence affecting business. We need to make sure that these services are available 24X7.
  • My current roles and responsibilities are as follows:
  • Handling tickets cut for any service related issues. Deep dive into the issue to find the root cause.
  • Once the root cause is identified then we have to do trouble shooting to fix the issue.
  • If the issue is related to other teams then involve them as well to get it fixed.
  • In case of any high severity issue jump into the conference call and co-ordinate with different teams to identify the issue.
  • Writing Shell script for creating internal reports.
  • Writing tools to reduce operational burden.
  • Performing deployment for different services with active monitoring to keep a watch on the services' health.
  • Escalating issues to developers of the services in the event of any anomalies found during deployment.
  • Interviewing candidates for different openings related to my domain.
  • Hardware Planning and EC2 migration
  • Creating and maintaining Auto scaling groups and maintaining availability zone redundancies
  • Planning and designing fault tolerant and high availability clusters
  • Handling bad datacenter and bad rack distribution risks
  • Performing regular audits for infrastructure
Unix Shell ScriptsPerlSQLJavaWeb servicesSupport Engineering+1

Cgi

Associate Software Engineer

Apr 2011Jul 2013 · 2 yrs 3 mos · bangalore

  • I used to write Unix shell scripts and automate manual work . I just loved to write Unix shell scripts.
  • I did following projects in CGI.
  • Network Mediation by Metasolv:-
  • Project Description:This application is customized application on top of Metasolv application (a Mediation product from oracle). In this project we used to process HSPA data received from Switches (Network Elements). After processing CDR’s we used to created DB files for bulk load purpose and creating bsi files(billing files)and send to downstream counter parts like Amdocs.
  • Roles & Responsibility
  • Wrote custom scripts to pull files from switch and after processing files send to down streams.
  • Installing application, creating node manager and node chains, setup application on new servers.
  • Wrote/maintained script using unix shell scripting and pl/sql to bulk load the files and send bulk load reports by email.
  • Designed custom alerting system using unix scripts and Perl scripts.
  • Create reports using scripts.
  • OMC 6.0 Upgrade :
  • Roles & Responsibility
  • Prepared build using ANT
  • Deployed application in performance test environment
  • Participated in performance test result review
  • Set up bulk load jobs on database server
  • Participated in parallel run
  • Deployed the application in production
  • Participated in post deployment activities
  • Message Process System:
  • Roles & Responsibility
  • Worked as Developer to convert the requirement into application.
  • Designed custom scripts and tools for fetching files from switches and parse them and send for processing.
  • DRP (Disaster Recovery Process)
  • Roles & Responsibility:
  • Sketched the plan for DRP according to situation
  • Tested prepared plan for different possible situation
  • I enjoy to automate the manual process and setting up alerting systems by automated scripts and cronjobs to help application team in identifying issue on time. I deployed various CR’s in other projects on the floor.
Unix shell scriptingPL/SQLANTSoftware DevelopmentAutomation

Education

UCLA Anderson School of Management

UCLA PGP Pro

Dec 2020Jan 2022

Great Lakes Institute of Management Gurgaon

Post Graduate Program in Cloud Computing — Cloud Computing

Jan 2019Jan 2019

UP Technical University - NIET Greater Noida

Bachelor of Technology (B.Tech.) — Information Technology Project Management

Jan 2006Jan 2010

Stackforce found 100+ more professionals with Automation & Cloud Computing

Explore similar profiles based on matching skills and experience