Shashilpi Krishan

Director of Engineering

Gurgaon, Haryana, India17 yrs 11 mos experience
Highly Stable

Key Highlights

  • 18 years of experience in E-Commerce and Travel domains.
  • Expert in DevOps, Site Reliability, and Cloud Security.
  • Proven track record in leading global teams and driving results.
Stackforce AI infers this person is a seasoned leader in E-Commerce and Cloud Infrastructure.

Contact

Skills

Core Skills

DevopsSite Reliability Engineering

Other Skills

AutomationBackend OperationsBig DataCassandraCommunication SkillsContinuous DeploymentCore JavaE-commerceELKGomezGrafanaGreat MotivatorHBaseHadoopHttpd

About

18 years of extensive experience in E-Commerce & Travel domain, DevOps, Secops, Site Reliability, Cloud, Security, Compliance, IT Network, Databases with progressive management and leadership experience with a tenacious commitment in driving timely results in complex and rapid growth environment. Swiftly able to adapt to changes in deadlines and changes in nature of assignments. Specialities ************* Hyperscale env running 1000+ dockerized microservices, highly available & consistent. AWS - Cloud Deployment, Capacity Planning, Right Sizing, Cost management with non-othodox Optimization techniques. Technology Planning for E-Commerce Strategy 24x7 Site Reliability, NOC, SLA/SLO and KPI coverage Incident Management & Reporting, Troubleshooting Monitoring, Observability & Incident Management - In house developed monitoring pipeline covering KPI like SLA latency percentiles, response codes, business monitoring, application failure modes coverage, system level checks, network monitoring, JMX, heap/thread dump analysis, GC performance. Tech used Python, Dataset, ELK, Kafka-Storm, OpenTSDB, Grafana, Zabbix, Diamond, Cloud watch, SNMP traps, New Relic, Catchpoint, Akamai mPulse (RUM) Devops & platforms - Highly scalable CI/CD Framework w/ multi lang & runtimes support with features like code coverage, build promotion or rejection based on test results, size checks etc. It handles over 1000+ builds or jobs per day. Developing in-house solutions for operational excellence, systems like Change Tracking system, Self-healing, Automated canary based deployment system, Anomaly detection for monitoring, Tech hygiene tracking system, Unified Alerting Console and more Databases & No SQL datastores admin (Mysql, Mongo, Aerospike, Cassandra, Redis, Elasticsearch, Kafka, Rabbit MQ etc.) Network & Security - PERIMETER (WAF, IPS), AppSec (SAST, DAST, VAPT), ENDPOINT (EDR, SSE, ZTNA, DLP, NAC) & Cloud SECURITY (CSPM, SSPM, DSPM, IAM), DB Security (DAM, FIM), SIEM, Compliance (PCI DSS, SOC2, SOX, GDPR) - Budgeting, Vendor/Contracts management - Expertise in leading multi-disciplinary global teams - Hiring & team building from scratch - Leadership, Straregizing & planning - Participating in tech conference as Speaker - Driving & Hosting DevSecOps conference

Experience

Adobe

Director of Engineering

Oct 2025Present · 5 mos · India · On-site

Makemytrip

5 roles

Vice President of Infrastructure

Promoted

Apr 2024Oct 2025 · 1 yr 6 mos

Director Site Operations (Infrastructure)

Promoted

Apr 2022Mar 2024 · 1 yr 11 mos

Associate Director

Promoted

Apr 2020Mar 2022 · 1 yr 11 mos

Senior Manager

Apr 2018Mar 2020 · 1 yr 11 mos

  • 1) Building Python-based Home Grown tools like Self Healing System, Continuous Deployment (EDGE), LB Manager, Alert monitoring Platform (AMP), Git/Gerrit/SONAR Stats (In House Git Prime)
  • 2) First line of defense,Live site troubleshooting, RCA, Smooth operations of high traffic Big Data multi-layer complex live sites, automated daily reports (DSR), incident management, keeping site performance & business OK 24x7 along with performance tuning, New DC setups, Capacity planning.
  • 3) Experience in technologies like NoSQL (Cassandra), SOLR, Hadoop stack, Storm, RMQ, Kafka, Zookeeper, Memcached, Couchbase & Hands-on experience in setup of monitoring pipeline from scratch) Syslog-ng, Filebeat-Logstash-Elasticsearch (ELK cluster), Celery/Beat, OpenTSDB, Grafana & Zabbix.
  • 4) Find/Innovate ways to reduce TTD & TTR by automation, hence increasing patrolling coverage & reduce manual efforts, create centralized dashboards for all components.
  • 5) Cross-­functional and people skills: passion for building, growing and sustaining strong organizations. Managing team of 25+ members of Devops & SRE
PythonNoSQLCassandraSOLRHadoopStorm+6

Manager, Website Operations (Site Reliability Engineering)

May 2016Apr 2018 · 1 yr 11 mos

  • Website Operations
  • Managing Site reliability Engineering & website operations for all MakeMyTrip LOBs like Flights, Hotels, Payments, Holidays, Bus, Rails & others including the complex multi-tier backend, fueled by so many Big Data technologies like Redis/Memcache, Couchbase, Real time processing (Storm/Kafka/ZK), SOLR, Hadoop/Hive etc.
  • First line of defense, Live site troubleshooting, escalations & automated daily status reports (DSR) incident management, measuring/keeping site performance & business OK 24x7.
  • Ensuring monitoring coverage (Application SLA, Functional, Business) using monitoring pipeline comprised of Syslog-ng -> Logstash -> ELK cluster -> Celery/Beat/RMQ -> OpenTSDB -> Grafana & Zabbix) & system monitoring via Nagios Engine (FAN) & Centreon.
  • Constant track of Payments metrics like monitoring individual Payment gateway success rates, payment attempt ratios, tracking scheduled/un-scheduled PG/Banks down times & escalations.
  • Creation and continuous update of Consolidated SLA & Biz dashboards in Grafana to keep a track of all live site components (frontend/backend) at one place for faster detection, repair & RCA.
  • Manage/Groom/Setup a world class operations team of 20 people by better planning, executions, tools/media to minimizing manual efforts and maximizing efficiency.
  • .
Syslog-ngLogstashELKNagiosGrafanaSite Reliability Engineering

Akamai technologies

Manager, System Operations

Jul 2015May 2016 · 10 mos · Bengaluru Area, India

Wize commerce (formerly nextag)

4 roles

Technology Operations Manager (Middleware & Infrastructure)

Promoted

Jun 2013Jun 2015 · 2 yrs

  • Managing operations of most of the live site components like maintenance, monitoring, Live site troubleshooting (RCA), fixing code at Level1, Reporting (front-end & back-end), measuring/keeping site performance OK 24x7.
  • Responsible for Configurations, Identify KPI, setup monitoring, create dashboards, Automated reports creation for Next Generation technologies integrated in company like Search Indexing (Hadoop, Storm), Distributed Search (Lucene), Apache Solr, Biz Object storage (MySQL & Cassandra, HBase), access and caching (Memcached) layers & Request Management (Crawler + User).
  • Development/Engineering ownership of in-house monitoring & deployment tools like M1/RA in JAVA.
  • Manage/Groom/Setup a world class technology operations team of 20-25 people by better planning, executions, tools/media to minimizing manual efforts and maximizing efficiency.

Team Leader (Technology Operations)

Promoted

Dec 2011Jun 2013 · 1 yr 6 mos

  • Responsible for setup, configuration, monitoring, testing & opeartions of backend infrastructural stuff including various new technologies like Search Indexing (Hadoop, Storm), Distributed Search (Lucene), Apache Solr, Biz Object storage (MySQL & Cassandra, HBase), access and caching (Memcached) layers & Request Management (Crawler + User).
  • Troubleshooting including Root cause analyses, taking Corrective Actions & exploring future preventions for operational live site issues.
  • Find, Research & Integrate better tools for swift operations of live site and reduce the manual efforst
  • Performance testing of releases including server side and client side both. Using tools Apache JMeter for server side & keynote, WebPageTest, Dynatrace, Gomez, Sitescope for client side.

Senior Quality Engineer

Promoted

May 2010Nov 2011 · 1 yr 6 mos

  • Complete ownership of testing backend infra things like various technology areas including Search Indexing (Hadoop), Distributed Search (Lucene), Biz Object storage (MySQL & Cassandra), access and caching (Memcached) layers & Request Management (Crawler + User)
  • Performance testing of releases including server side and client side both. Using tools Apache Jmeter for server side & keynote, webpagetest, gomez, sitescope for client side.
  • Troubleshooting including Root cause analyses, taking Corrective Actions & exploring future preventions for operational live site issues.

Quality Engineer

Nov 2007Apr 2010 · 2 yrs 5 mos

  • Came as fresher in NexTag as Quality Engineer., worked lot on web application testing & operations. Got great exposure to linux servers environment & administration, shell scripting. Worked on various modules like Product Shopping, Travel, Request Management, SEO, SEM, Esxi Virtualization etc

Education

Birla Institute of Technology and Science, Pilani

Master of Technology - MTech — Computer Systems & Infrastructure

Jan 2017Jan 2019

Maharshi Dayanand University

Bachelor of Technology - BTech — Electronics & Communication

Jan 2003Jan 2007

Stackforce found 100+ more professionals with Devops & Site Reliability Engineering

Explore similar profiles based on matching skills and experience