Dipto Chakrabarty

DevOps Engineer

United States5 yrs experience
Most Likely To Switch

Key Highlights

  • Led automation efforts for global-scale MySQL clusters at Cisco.
  • Achieved 99.99% SLA during datacenter shutdown drills.
  • Specialized in cloud infrastructure and distributed systems.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with expertise in Site Reliability Engineering and Automation.

Contact

Skills

Core Skills

Cloud ComputingSite Reliability EngineeringDatabase ManagementDevops

Other Skills

Amazon Web Services (AWS)AnsibleAnsible towerPython (Programming Language)KubernetesDatabasesMySQLSoftware InfrastructuregolangReliability EngineeringInternet Protocol Suite (TCP/IP)PythonAirflowOracle ERP ImplementationsRed Hat Linux

About

I'm a Site Reliability Engineer and Software Engineer with close to 3 years of work experience , currently pursuing my Master's in Software Engineering at Carnegie Mellon University, where I specialize in distributed systems, cloud infrastructure, and software architecture. At Cisco, I led automation efforts across global-scale MySQL clusters streamlining backup, migration, and credential rotation workflows. These systems supported terabytes of production data and critical services with zero downtime, giving me hands-on experience with reliability engineering at scale. I enjoy solving systems problems that sit at the intersection of software and infrastructure. Whether it’s building a stateless microservice to handle 10M+ daily events on AWS, optimizing ETL pipelines with Spark, or containerizing services with Kubernetes and deploying them with Helm, I bring a strong execution mindset to shipping reliable solutions. My technical toolkit includes Golang, Python, C++, Django, Kubernetes (CKA certified), Terraform, Jenkins, Ansible, MySQL, MongoDB, Kafka, Redis, AWS, and Azure. I’ve worked with SQS, CloudWatch, EKS, and Aurora for production-grade pipelines, and also explored multi-cloud setups and hybrid architectures. I thrive in high-ownership roles where I can build scalable systems, automate complexity away, and collaborate with cross-functional teams to improve availability, performance, and observability. I’m currently looking for full-time roles in Site Reliability Engineering or Software Engineering where I can apply my background in systems, automation, and cloud-native technologies to make an impact. Let’s connect if you’re building resilient platforms or hiring for backend/SRE roles I would love to chat!

Experience

5 yrs
Total Experience
1 yr 8 mos
Average Tenure
1 yr 9 mos
Current Experience

Surefront

Senior Software Architect (University 1 year Capstone Project)

Jan 2025Dec 2025 · 11 mos · Pittsburgh, Pennsylvania, United States · On-site

  • Led a 6-person team to ship a Process Analytics feature by building an end-to-end event logging and process mining pipeline, converting over 60 million daily events into interactive flowcharts for business stakeholders improving architecture to withstand increased 40% more load.
  • Reduced root-cause analysis time of B2B e-commerce workflows and operational processes by 200%, enabling teams to rapidly identify bottlenecks and drive continuous process improvements.
Amazon Web Services (AWS)Cloud Computing

Carnegie mellon university

Student

Sep 2024Present · 1 yr 9 mos · Pittsburgh, Pennsylvania, United States

Cisco

2 roles

Site Reliability Engineer

Aug 2022Aug 2024 · 2 yrs · On-site

  • Ensured Mysql clusters reliability during CISCO wide datacenter shutdown drill impacting 10k+ distributed services designing failover tooling and recovery playbooks that sustained 99.99% SLA and reduced alerting time by 45%.
  • Scaled back up automation for 30 production MySQL clusters generating 150TB data per month with Python and Airflow reducing runtime by 30% and eliminating 50% manual steps improving resilience of distributed storage environments.
  • Recognized as a Mysql SME diagnosing high severity incidents with AppDynamics lowering MTTR by 30% and restoring service reliability under peak load conditions.
  • Developed multi-threaded API service in Python to parallelize operations improving execution throughput by 35% and implementing robust retry mechanism with event logging to ensure fault tolerance under load.
  • Engineered a Django and Ansible migration framework for 150 clusters from Oracle to Percona cutting per cluster migration time from 12 to 8 minutes enabling seamless service continuity and delivering quarterly savings.
  • Architected feature flagged automation framework for database clusters utilizing python and MongoDB boosting upgrade velocity by 40% enabling safe, incremental feature adoption with no customer disruption.
  • Contributed to GitOps workflow for Redis as a service on Kubernetes enabling clusters to be deployed in seconds reducing manual effort by 90% and accelerating velocity across product teams.
  • Built self-service Ansible Tower provisioning platform for Oracle databases replacing bash scripts reduced automation time to 30 minutes.
AnsibleAnsible towerPython (Programming Language)KubernetesDatabasesMySQL+6

SRE Intern

Jan 2022Jul 2022 · 6 mos · On-site

  • Modernized SSO authentication for internal ERP, trimming infra footprint by 45% while improving scalability and security.
Python (Programming Language)Oracle ERP ImplementationsRed Hat LinuxBashDevOps

Summer of bitcoin

Open Source Developer

Jul 2021Sep 2021 · 2 mos

  • I am a contributor to the utreexo project which is a hashed based accumulator in golang. Utreexo is an ongoing project which provides an enhancement on how the bitcoin network verifies transactions by checking the utxo set of a user.
  • Some of the issues I worked on are
  • Wrote tests for testing the serialisation and deserialization of pollards which is a sparse representation of forest of binary trees in utreexo.
  • Added a feature to write undoblocks data to disk which was missing earlier to handle the case of re orgs within the block chain. Developed a function to write undoblocks data retrieved from modifying the chain and passing the data from channel to go routine.
  • Split up the proof generation and writing of the proof by channels into separate data blocks by the accumulator to prevent buffer overhead in the data. This was achieved by using separate go routines and channels to write different proofs of data in different directories.

Kaloory

Cloud Administrator Intern

Jul 2020Aug 2020 · 1 mo · India

  • At kaloory my work revolved around managing and operating the cloud services of the company.
  • I was responsible for configuring jitsi meet on docker containers with jwt authentication required by the company for their video streaming service.
  • I also deployed and managed updates to the backend nodejs server which was hosted on an ec2 server and running behind an nginx proxy.
  • To perform updates with the codebase present in github I had setup jenkins and written jenkinsifle to support multi branch builds.

Machaao inc

Devops Intern

May 2020May 2021 · 1 yr · India

  • I am a Devops Intern where I handle the infrastructure and work on writing applications and deploying them to our kubernetes cluster using CI/CD pipeline tools like jenkins.
  • My work revolves around managing the development and production kubernetes clusters , checking and debugging for errors in any of the micro services or servers.
  • I am also responsible for simplifying the process of deploying changes from our code base to our production environment for which I write jenkinsfile for the repositories which we manage in GitHub.
  • My work also includes writing various python applications which include a aws lambda functions , mongodb database backup and restoration script and apis for deploying applications in the cluster in minimal steps.
  • I also work on integrating cloud native tools like fluxcd for continous deployment of cluster services and hashicorp vault for storing important files related to the infrastructure.

Linuxworld informatics pvt ltd

Summer Intern

May 2019Jun 2019 · 1 mo · Jaipur, Rajasthan, India

Codechef-vit

Technical Team Member

Mar 2019Jun 2020 · 1 yr 3 mos · Vellore, Tamil Nadu, India

  • CodeChef is a technical chapter in VIT University, Vellore, aiming to fill students with the spirit of coding and to equip them in order for them to contribute significantly in the field of Computer Science and Engineering.
  • I worked on deploying projects and integrating pipelines for the projects made.

Education

Carnegie Mellon University

Master's degree — Software Engineering

Sep 2024Dec 2025

Vellore Institute of Technology, Vellore

BTech - Bachelor of Technology — Computer Science

Jan 2018Jan 2022

Stackforce found 100+ more professionals with Cloud Computing & Site Reliability Engineering

Explore similar profiles based on matching skills and experience