Vikrant yadav

DevOps Engineer

Bengaluru, Karnataka, India10 yrs 10 mos experience

Key Highlights

Expert in cloud migration and infrastructure optimization.
Proficient in Site Reliability Engineering and DevOps practices.
Strong background in data warehousing and automation.

Stackforce AI infers this person is a Fintech-focused Site Reliability Engineer with strong DevOps capabilities.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud MigrationDevopsInfrastructure Capacity PlanningContinuous Integration And Continuous Delivery (ci/cd)Data WarehousinhgInformatica

Other Skills

APMAWSAirflowAmazon Web Services (AWS)AnsibleBash scriptingConfluenceElastic Stack (ELK)GrafanaJenkinsKafkaLinuxNagiosPythonSQL

About

Experienced SRE/DevOps Engineer with a demonstrated history of working in the information technology and services industry. I always keep a "can-do approach" for any assignment. I like challenging jobs and always put forth my best to achieve my goal.

Experience

10 yrs 10 mos

Total Experience

1 yr 8 mos

Average Tenure

10 mos

Current Experience

Degreed

Release Engineer

Jul 2025 – Present · 10 mos · On-site

Rapyd cloud

SRE lead

Dec 2022 – Jul 2024 · 1 yr 7 mos

Morgan stanley

Principal DevOps/SRE Engineer

Aug 2021 – Aug 2022 · 1 yr

Experienced in migration of legacy on-prem applications to Cloud using AWS Services (EC2, ECS, ELB, S3, Lambda, Route53, Auto Scaling, IAM, EKS, Cloud-watch, SNS, SES, VPC, IAM etc).
Managed complete Infra setup using Terraform, Ansible, Jenkins, Docker and Bash scripting.
Supporting and maintaining the firm's trading Linux infrastructure. Contributing to tools and systems to fully automate the provisioning, configuration and monitoring of thousands of Linux servers.
Rebuild FXEoptions Aquilon personality and simplified existing environment (one personality per env.,~40 servers rebuilt in production).
Completed Kerberos EOL hygiene analysis ~500 boxes remediated.
Capacity management across different plants and regions, promptly setting up new VM servers/memory upgrades across other plants to promptly mitigate capacity constraints.
Checked system performance lifecycle (including NFR) and identified key metrics for performance improvements. Participating in existing capacity review discussions as part of services/usability and cost optimisation.
DataRobot tooling expansion, custom usage reports and patch improvements which are more compatible relatively with existing releases.
Build, maintain, and enhance monitoring framework (logs/metrics collection, alert aggregation, dash-boarding) and implement and enhance alerting logic (framework).
Migration of existing prod/QA aurora to Aquilon under all existing plants. Creation of multiple Splunk/Grafana dashboards for proactive alerting/reporting purposes. Created monthly ION/Etrade shares report.

Shell ScriptingServer MonitoringSite Reliability EngineeringDevOpsAPMCloud Migration+3

Flipkart

Sr. Product Solution Engineer

Aug 2019 – Aug 2021 · 2 yrs · Bengaluru, Karnataka, India

Created dozens of clusters for Griffin core, Griffin SVC, Spark, Storm, Kafka, Aerospike, Airflow, Zookeeper, rabbitMQ for different Zones.
Replicated all the cosmos dashboards available in Chennai to Hyderabad DC for Reco manifestation components.
Responsible for the maintenance, configuration, and reliable operation of computer systems and virtualization.
Experienced in installation, configuration, and troubleshooting of linux based servers and product, opensource application, remote support and monitoring.
Proactively working on issues, executing planned changes providing solutions to enhance quality of service and to prevent future problems.
Created all the necessary Nagios application alerts for Reco manifestation components in Hyderabad DC.
Managing essential core services such as DHCP, LDAP, DNS, and NFS for on-prem and hosted data centers as well as public clouds.
Created personalised Debian package of all tech stacks as per team requirement.
Create and Implemented deployment via ansible from scratch on all tech stacks.
Implemented One-click Deployment service in griffin_core,griffin_svc and relevance side in Chennai/Hyderabad DC.
Implemented Migrated Jenkins jobs from old Jenkins to m3-Jenkins and validated the deployment pipeline.
Configured all (griffin core, svc, relevance storm, spark, kafka, etc..) the system related Nagios alerts and application alerts on recommendation to ensure the environment is stable.
Created multiple cosmos dashboards to see the application and system metrics.
Developed and implemented performance improvement strategies and plans to promote continuous improvement.

AnsibleSplunkElastic Stack (ELK)TerraformSite Reliability EngineeringAmazon Web Services (AWS)+5

Wells fargo

Sr. Associate Technology

Aug 2018 – May 2019 · 9 mos · Bengaluru, Karnataka, India

Strong Experience in supporting/Reporting Applications and DataControl Alerts.
Debugging/Analysing and performing RootCause analysis on alerts reported.
Automating tasks via Shell Scripting/SQL/Python as part of process improvement.
Proactively monitoring availability and performance of production servers.
Work closely with DevOps/ development team to freeze configurations/playbook for various teams & internal applications. Deploy and maintain standard tools such as Ansible, Terraform etc for the same.
Performing sanity checks for all the servers before the start of business hours to check the correct functioning of
the servers.
Monitoring, Health check of server and User & Group Management Installation of any software (RPM Package) and configuration and CRON scheduler.
Managed to work in groups and independently on side projects.
Drive the outage calls, Handled ON CALL Support and Provided knowledge transfer to peers.

AnsibleUnixSite Reliability EngineeringConfluenceDevOpsGrafana

Wipro

Sr. Project Engineer

Jan 2017 – Jul 2018 · 1 yr 6 mos · Pune, Maharashtra, India

Strong Experience in scheduling tools and monitoring computer and peripheral equipment, including expertise in a scheduling application.
Worked on IAM roles /policies, IAM feature, S3, EC2, and lambda in the AWS platform.
Hands-on experience in CI/CD and Jenkins Pipeline creation.
Worked on setting up Cloud Watch alerts/notifications.
Own SLI, and SLO configuration as per Error Budget.
Good understanding of networking: TCP/IP, IP addresses, HTTP, DNS, VPN. Especially cloud networking. Perform performance analysis, proactive monitoring, continual improvement and capacity planning for production, virtualised environment Involved in activities for setting up network VPC.
Developing/Modifying Core Informatica Jobs/Mapping.
Debugging and performing Root Cause analysis. Automating tasks via Ansible/Shell Scripting/Python as part of process improvement.
Solely managing UNIX/PYTHON/TWS Scripts/Jobs and delivering it under defined SLA.
Handled multiple Releases to production.
Creating Technical documents/Confluence updates around process implementation and also delivering sessions within TEAM.
Creating CR( change Request) and Runbook for NEW/Alteration Requests.

Shell ScriptingAnsibleSite Reliability EngineeringJenkinsAmazon Web Services (AWS)DevOps+1

Amdocs

DWH/BI Developer

Nov 2013 – Jan 2017 · 3 yrs 2 mos · Pune/Pimpri-Chinchwad Area

This data warehouse is among the biggest data warehouses in the world.
The project ranges from the retrieval of the OLTP data from various sources like Telegence (billing),
Uverse (billing), CIM (address locations) and various other, to the final loading of the OLAP data into
one single huge database called eCDW (enterprise consolidated data warehouse).
The OLTP to OLAP conversion comprises of capturing the delta data from ORACLE reporting DB using
Golden Gate mechanism, then loading it into Oracle staging area after transforming it using
Informatica, and then loading it into common DB (Teradata) using various load utilities like tpt, mload
and bteq.
eCDW Core applications Migration from UNIX to LINUX grid platform.
The scripts are written in UNIX servers to make it more robust and efficient performance wise.