Arun Kumar Jain K

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India12 yrs 2 mos experience

Key Highlights

  • Expert in cloud engineering and DevOps practices.
  • Proven track record in automation and infrastructure optimization.
  • Strong background in site reliability engineering.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with a strong focus on DevOps and Site Reliability Engineering.

Contact

Skills

Core Skills

MonitoringSite Reliability EngineeringDevopsCloud EngineeringInfrastructure AutomationCloud MigrationSecurity

Other Skills

AWSAWS Auto ScalingAWS CloudWatchAWS SDKAWS Systems ManagerAmazon CloudWatchAmazon Web Services (AWS)AnsibleAzureBashCC#C++CDNCSS

About

Experienced Software Engineer with a demonstrated history of working in the computer software industry. Skilled in Automation, DevOps, Performance, Capacity planning, Terraform, Ruby, Bash, Python etc

Experience

12 yrs 2 mos
Total Experience
1 yr 8 mos
Average Tenure
1 yr 10 mos
Current Experience

Coupang

Staff SRE/OE

Aug 2024Present · 1 yr 10 mos · Bengaluru, Karnataka, India · Hybrid

LokiMimirPrometheus.ioGrafanaMonitoringSite Reliability Engineering

Abacus.ai

SRE

Sep 2022Aug 2024 · 1 yr 11 mos · Bengaluru, Karnataka, India · Remote

  • 1. Built an end-to-end Jenkins pipeline (using groovy) for ML Workflows supporting multi-cloud (aws, gcp and Azure) to ensure the PR is merged only if all the test cases are passed across clouds.
  • 2. Built a Custom Autoscaler for Kubernetes in python for all of our clusters in AWS, GCP and Azure. This helped in faster scheduling of pods, easier debugging. Some of the insights helped in cost optimisation as well.
  • 3. Setting up multiple clusters on KOPS, managing ASGs on multi cloud for AI/ML workflows.
Cost EngineeringMicrosoft AzurePythonGoogle Cloud Platform (GCP)KubernetesAmazon Web Services (AWS)+4

Goldman sachs

Vice President

Nov 2020Sep 2022 · 1 yr 10 mos · Bengaluru, Karnataka, India

  • 1. Intent Deployments: Automated the entire deployment process using PowerShell scripts and AWS Systems Manager(SSM).
  • 2. Metrics Collector/Monitoring Dashboard: Developed an automation script in PowerShell that push various app and process metrics at a frequent interval to AWS CloudWatch logs. Automated using AWS SSM to deploy and run the scripts on all EC2 instances. Created a dashboard in CloudWatch that gives an overall picture of the application.
  • 3. Automated deployment of Infrastructure(AWS) resources using Terraform.
AnsibleTerraformInfrastructure as code (IaC)GitAmazon CloudWatchAmazon Web Services (AWS)+2

Myntra jabong

Senior Software Engineer

May 2019Oct 2020 · 1 yr 5 mos · Bengaluru Area, India

  • 1. Created machine images using packer to ensure all VMs meet a baseline standard(packages and frameworks required as prerequisites) as well as speeding up provisioning time.
  • 2. Supporting all migration related activity from provisioning VMs to configuring DNS, HAP etc.
  • 3. Rolled out a plan to do hot migration 100s of TBs of data from AWS S3 to Azure Blob.
  • 4. Hot Migration of mail services from AWS SES to Sendgrid.
  • 5. Developed a script in both bash(az cli) and golang to generate resource utilisation report(matrix) of all VMs. Extended it to support other components as well. This helped identify the under utilised resources and also helped in the infra cost cutting side.
Shell ScriptingAnsibleCost EngineeringTerraformInfrastructure as code (IaC)Microsoft Azure+7

Hackerearth

Senior Site Reliability Engineer

Mar 2018May 2019 · 1 yr 2 mos · Bengaluru, Karnataka, India

  • 1. Developed an automation script using aws cli to analyze the resource utilisation of all the running EC2 instances which helps scale in/out AWS resources using the metrics. Helps in cost optimisation.
  • 2. Handling infrastrcuture automation, performance enhancement in code, scalability of Hackerearth recruit.
  • 3. Load test analysis using jmeter for capacity planning of resources and eliminating bottlenecks to ensure high availability.
  • 4. Migrating CDN from Fastly to Akamai to reduce the network latency of users traffic
  • 5. Placed a tracker to measure the uptime of the site for better visibility of downtimes, analyse root case and make necessary fixes wherever applicable.
Shell ScriptingCost EngineeringBashCDNCost ReductionPython+6

Freshdesk

DevOps Engineer

Oct 2015Nov 2017 · 2 yrs 1 mo · Chennai Area, India

  • 1. Built a service in golang to prevent DoS attack by blocking IPs and Domains(using unix sockets) in HAProxy layer without reload/restart. Reduced the response time of each requests by **10ms**
  • 2. One click deployment - Built a robust tool for automating the entire Blue-Green deployment life cycle of our application using AWS-SDK for Ruby. Eliminates the human effort and prevent human errors.
  • 3. Developed Automation tools for simulating production traffic in test environments and analyse the performance in terms of database and use it for comparison with previous deployment(s).
  • 4. Developed SQL/Code console(Internal tool) for developers to view production data and maintain the audit.
  • 5. Handling deployments - Involves writing chef recipes, running db migrations, blue green deployment model.
  • 6. Involved in debugging and resolve staging/production issues.
  • 7. Performed Tasks like building internal tools(eliminating manual data entry), writing scripts, chef recipes etc.
RedisShell ScriptingRubyRuby on RailsHAProxyChef+6

Bally technologies

Associate Software Analyst

Oct 2013Sep 2015 · 1 yr 11 mos · Chennai Area, India

  • Undergraduate software developer for a casino gaming company that primarily focuses on player management and slot games. Specialized in C# and SQL Server.

Education

Madras Institute of Technology

Engineer's Degree — Computer Software Engineering

Jan 2009Jan 2013

Stackforce found 100+ more professionals with Monitoring & Site Reliability Engineering

Explore similar profiles based on matching skills and experience