Varad Ghodake

Software Engineer

Chicago, Illinois, United States4 yrs 7 mos experience

Most Likely To Switch

Key Highlights

Improved fault-tolerance by 50% in job submission platform.
Redesigned event processing engine for 2 million events/day.
Developed web-app reducing MTTR by 19%.

Stackforce AI infers this person is a Backend Engineer with strong expertise in Site Reliability Engineering and event-driven architectures.

Contact

Skills

Core Skills

Site Reliability EngineeringPostgresqlReact.js

Other Skills

Apache KafkaApache ZooKeeperC++Customer ServiceDjangoDjango REST FrameworkElasticSearchGo (Programming Language)HTML ScriptingJavaScriptLinuxPHPPythonRed Hat Linux

About

Software Engineering, Backend and System Design. More about me: https://varadghodake.github.io

Experience

4 yrs 7 mos

Total Experience

2 yrs 3 mos

Average Tenure

2 yrs 9 mos

Current Experience

Imc trading

Software Engineer

Aug 2023 – Present · 2 yrs 9 mos · Chicago, Illinois, United States · On-site

Georgia institute of technology

Graduate Research Assistant

Jan 2022 – Aug 2023 · 1 yr 7 mos · Atlanta, Georgia, United States

Georgia Tech - Research Network Operations Center (RNOC) under the Institute for People and Technology (IPaT). On-premise RedHat OpenShift cluster maintenance and improvements.

The d. e. shaw group

2 roles

Software Engineer

Jul 2019 – May 2021 · 1 yr 10 mos · Hyderabad, Telangana, India · On-site

As an active member of the Quant Systems SRE team, improved fault-tolerance of the job submission platform by 50% with streaming replicas using the PostgreSQL Patroni framework in a newly created hypercluster and saved 19% of the monthly error budget.
As a monitoring Subject Matter Expert, redesigned the event processing engine to facilitate event-sourcing with Kafka; bumped up the maximum event coverage capacity (SLO) up to ~2 million events/day.
Conducted enterprise and DMZ Linux server patching and releases – servers that run ElasticSearch cluster, Vault secret store, and core infra services such as DNS, Puppet, and Kerberos.
Set up a multi-site and highly-available Artifact Repository to store ML models and CI/CD outputs with Anycast routing for low-latency uploads and downloads. I was a Directly Responsible Individual (DRI) for this system serving more than 600K artifacts with 99.9% availability SLO.

Site Reliability EngineeringApache KafkaPostgreSQLElasticSearchLinux

Software Engineer Intern

May 2018 – Jul 2018 · 2 mos · Hyderabad, Telangana, India · On-site

Analyzed tickets and system-wide issues reported for better operational efficiency. To mitigate some frequently occurring issues, designed a web-app to provide insights.
Developed a visualization console with React front-end, Python backend, and DE Shaw proprietary JavaScript libraries.
On top of reducing MTTD (Mean Time To Detect), the project also reduced MTTR (Mean Time To Repair) taken by SREs to fix users’ NFS home directories, grid job submissions, active sessions, and group memberships by 19%.

React.jsPythonJavaScript