Chaitanya Kirty

Data Engineer

Bengaluru, Karnataka, India12 yrs 4 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Over three years of experience in data engineering.
Expertise in Azure and AWS data tools.
Certified Databricks Spark Developer.

Stackforce AI infers this person is a Data Engineer specializing in scalable ETL solutions within the SaaS industry.

Contact

Skills

Core Skills

Data EngineeringEtl

Other Skills

Azure Data LakeAmazon S3Azure DatabricksPysparkHiveAzkabanCeleryRabbitMQAmazon RedshiftCore JavaExtract, Transform, Load (ETL)GitJenkinsYARNHadoop

About

Currently serving as Data Engineer Lead at Vista, with over three years of experience in designing and implementing scalable ETL platforms. Expertise in tools like Azure Data Factory, Azure Databricks, and AWS services enables the organization to streamline data workflows and enhance integration capabilities. Certified as a Databricks Spark Developer, demonstrating advanced proficiency in Python and SQL to optimize data transformations and processing efficiencies. Previously contributed to big data platform enhancements at Fractal Analytics and Tavant by building horizontally scalable ETL pipelines and optimizing batch workflows with tools like RabbitMQ, Celery, and Azkaban. Dedicated to empowering teams with robust data solutions and fostering innovation in the big data and analytics domain.

Experience

12 yrs 4 mos

Total Experience

3 yrs 1 mo

Average Tenure

4 yrs 2 mos

Current Experience

Vista

Data Engineer Lead

Mar 2022 – Present · 4 yrs 2 mos · Bengaluru, Karnataka, India · Remote

Azure Data LakeData EngineeringETL

Fractal analytics

Senior Engineer

Jun 2019 – Mar 2022 · 2 yrs 9 mos · Bengaluru, Karnataka, India

Amazon S3Azure DatabricksData EngineeringETL

Tavant

2 roles

Senior Data Engineer

Promoted

Jun 2018 – May 2019 · 11 mos

Project Description-
Grubhub Data platform project is to migrate existing home grown ETL tool of grubhub to Hadoop platform.
The old ETL tool was built on AWS redshift and python based system to handle more than 300 jobs daily to generate different reports.
To cope with the scalability and processing speed the system is being migrated to Hadoop eco system using Pyspark, Hive, Azkaban and PrestoDB.
Design and development of the horizontally scalable ETL platform using Celery and RabbitMQ
Using Different APIs data is fetched to raw bucket(Data Lake) in AWS S3 and latter using Spark Dataframes and Spark SQL data is transformed as required and stored into asset bucket, Hive tables are formed on top of it.
Jobs are run on AWS EMR clusters.
Query is performed using Hive QL to generate reports and data is given to re-dash for adhoc querying.
Team is using Azkaban for job execution and scheduling.
Roles & Responsibility-
Analyse the pre-existing ETL tool python codes to understand the jobs.
Use existing APIs or Write own APIs to connect to data sources and fetch data to AWS s3.
Design spark code for data transformation requirements using Spark DF and Spark sql.
Migrate legacy ETL jobs to GDP (Grubhub Data platform) using Pyspark
Write Hive query for report generation and data processing.
Design job flows and schedule in Azkaban for automation processing.

Amazon S3Azure DatabricksPysparkHiveAzkabanCelery+3

Big Data Engineer

Dec 2016 – May 2018 · 1 yr 5 mos

Project Description-
Grubhub Data platform project is to migrate existing home grown ETL tool of grubhub to Hadoop platform.
The old ETL tool was built on AWS redshift and python based system to handle more than 300 jobs daily to generate different reports.
To cope with the scalability and processing speed the system is being migrated to Hadoop eco system using Pyspark, Hive, Azkaban and PrestoDB.
Design and development of the horizontally scalable ETL platform using Celery and RabbitMQ
Using Different APIs data is fetched to raw bucket(Data Lake) in AWS S3 and latter using Spark Dataframes and Spark SQL data is transformed as required and stored into asset bucket, Hive tables are formed on top of it.
Jobs are run on AWS EMR clusters.
Query is performed using Hive QL to generate reports and data is given to re-dash for adhoc querying.
Team is using Azkaban for job execution and scheduling.
Roles & Responsibility-
Analyse the pre-existing ETL tool python codes to understand the jobs.
Use existing APIs or Write own APIs to connect to data sources and fetch data to AWS s3.
Design spark code for data transformation requirements using Spark DF and Spark sql.
Migrate legacy ETL jobs to GDP (Grubhub Data platform) using Pyspark
Write Hive query for report generation and data processing.
Design job flows and schedule in Azkaban for automation processing.

Amazon S3Data EngineeringETL

Infosys

2 roles

Senior Systems Engineer

Promoted

Jan 2015 – Dec 2016 · 1 yr 11 mos · Bengaluru Area, India

Role Description:
Involved in developing the Hive scripts .
Using Sqoop exported Hive external output Processed data into MySQL.
Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
Developing the Hive Queries to import the data from the local environment to HDFS environment
Involved in Schema Design
Involved in working with CSV, TSV, XML, JSON files in HIVE.
Hive Data sampling by using Bucketing concepts.
Writing Hive and UDF’s on different datasets.