Krishan Bajaj

SRE (Site Reliability Engineer)

Gurgaon, Haryana, India8 yrs 1 mo experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in AWS and cloud-based monitoring tools.
  • Proficient in deploying microservices and automation.
  • Strong background in incident management and troubleshooting.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in monitoring and automation.

Contact

Skills

Core Skills

AwsMonitoringVisualizationDatabase ManagementAutomationMicroservicesDeploymentIncident Management

Other Skills

Agile MethodologiesCC++DebeziumDockerECSElastic Stack (ELK)ElasticSearchFilebeatGitGrafanaGraphiteHTMLInfluxDBJava

About

Experienced Site Engineer with a demonstrated history of working in the internet industry. Skilled in Oracle Database, Microsoft Word, Apache Kafka, sumo Logic, Stack Storm, ES-Stack, Grafana, Zabbix and HTML. Strong information technology professional with a Bachelor of Engineering focused in CSE from Chitkara University, Patiala.

Experience

8 yrs 1 mo
Total Experience
4 yrs
Average Tenure
4 yrs 7 mos
Current Experience

Airtel

2 roles

Manager-infra devops

Jun 2024Present · 1 yr 11 mos · Noida, Uttar Pradesh, India

Senior Engineer–Infra DevOps

Oct 2021Jul 2024 · 2 yrs 9 mos · Noida, Uttar Pradesh, India

  • Technologies
  • StackStorm: OpenSource event-driven platform for automation
  • Grafana: It allows you to query, visualize, alert on and understand your metrics no matter where they are stored
  • Zabbix: It monitoring metrics, such as network utilization, CPU load and disk space consumption that can be received over email.
  • Filebeat: It allows us to read from the files and send it to various outputs like Kafka, elasticsearch, Syslog, etc
  • Aws: ECS, EC2, S3, Creating Rest API to create and delete instances
  • Debezium: It is an open-source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.
  • Sumo Logic: It is a cloud-based tool that helps in parsing, Reading the logs
  • Prime Responsibilities
  • Analysis of Apache, Ngnix, and Diamond logs in order to monitor the response time, CPU-usage, and response-code.
  • Create and maintain documentation of systems and processes for existing and new systems.
  • Design and Development of Tool that Fetch Alerts from Service Now and Update the DB and Visualization.
  • Designing a tool to create graphs for recording the key problems/solutions that caused/resolved the issue.
  • Built and deployed Docker containers to break up monolithic app into micro services thus increasing scalability and optimizing speed.
  • Experience in creating various dashboards, metrics, alarms and notifications for servers using AWS Cloud Watch, Grafana, Prometheus, Influxdb.
  • Identify and correct the root cause of various system alarms. Recommend changes to avoid their recurrence.
  • Responsible for setting up ELK (Elastic search, Logstash, and Kibana) platform, parsing unstructured logs using regular expressions to structured JSON format.
  • Setting up alerts and handling overloads on server.
StackStormGrafanaZabbixFilebeatAWSDebezium+2

Makemytrip

3 roles

Senior Site Reliability Engineer

Oct 2020Oct 2021 · 1 yr

Site Reliability Engineer

Apr 2018Oct 2020 · 2 yrs 6 mos

  • Technologies
  • StackStorm: OpenSource event-driven platform for automation
  • Grafana: It allows you to query, visualize, alert on and understand your metrics no matter where they are stored
  • Zabbix: It monitoring metrics, such as network utilization, CPU load and disk space consumption that can be received over email.
  • Filebeat: It allows us to read from the files and send it to various outputs like Kafka, elasticsearch, Syslog, etc
  • Aws: ECS, EC2, S3, Creating Rest API to create and delete instances
  • Debezium: It is an open-source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.
  • Sumo Logic: It is a cloud-based tool that helps in parsing, Reading the logs
  • Prime Responsibilities
  • 1) The first line of defense, Live site troubleshooting, RCA, Smooth operations of high traffic, automated daily reports (DSR), incident management, keeping site performance & business OK 24x7.
  • 2) Find/Innovate ways to reduce TTD & TTR by automation, hence increasing patrolling coverage & reduce manual efforts, create centralized dashboards for all components.
  • 3) Deployment of microservices in ECS using Docker
  • 4) Write Workflows for self-healing for runbook automation, Server Restart, Executing Rest Api, others
  • 5) Parsing Apache/Nginx logs using filebeat,logstash, elastic search, grafana
StackStormGrafanaZabbixFilebeatAWSDebezium+2

Site Reliability Engineer

Apr 2017Apr 2018 · 1 yr

  • Technologies
  • Grafana: It allows you to query, visualize, alert on and understand your metrics no matter where they are stored
  • Zabbix: It monitoring metrics, such as network utilization, CPU load and disk space consumption that can be received over email.
  • Prime Responsibilities
  • 1) The first line of defense, Live site troubleshooting, RCA, Smooth operations of high traffic, automated daily reports (DSR), incident management, keeping site performance & business OK 24x7.
  • 2) Find/Innovate ways to reduce TTD & TTR by automation, hence increasing patrolling coverage & reduce manual efforts, create centralized dashboards for all components.
  • Offered a final Pre-Placement full-time job offer basis the performance during the Internship
GrafanaZabbixMonitoring

Education

Birla Institute of Technology and Science, Pilani

Master's degree — Computer Software Engineering

Jun 2021Jul 2023

Chitkara University, Patiala

Bachelor of Engineering — Computer Science

Jan 2014Jan 2018

Stackforce found 100+ more professionals with Aws & Monitoring

Explore similar profiles based on matching skills and experience