Ajay Kadiyala

AI Researcher

Bengaluru, Karnataka, India8 yrs 4 mos experience

Most Likely To Switch

Key Highlights

7+ years of experience in data engineering
Expert in building scalable data solutions
Certified Azure Data Engineer with diverse tech stack

Stackforce AI infers this person is a Data Engineering expert specializing in E-commerce and Fintech solutions.

Contact

Skills

Core Skills

Data EngineeringBig Data

Other Skills

Apache AirflowApache FlinkApache KafkaApache SparkApache SqoopData AnalyticsData VisualizationData WarehousingExtract, Transform, Load (ETL)GitHubHBaseHDFSHadoopHiveKubernetes

About

As a Lead Data Engineer with 7+ years of experience, I tackle the complexities of real-time e-commerce data, focusing on competitor insights and pricing intelligence. Our approach centers on robust data storage and sophisticated processing through upstream services. I build scalable distributed data solutions using Spark, Sqoop, Hive, and Cassandra for various clients. I am responsible for ingesting, processing, and analyzing large volumes of structured and unstructured data from multiple sources, and developing frameworks, code building, analyzing, optimizations and data quality on HDFS data lakes. I have worked on different projects that provide involving large scale data, in various forms. I am a certified azure data engineer experienced in Blob storage, data bricks, ADF, synapse analytics. i have strong experience in pyspark, SQL, intermediate Scala spark technologies. Previously at PwC, Accenture, I played a pivotal role in constructing data lakes and streaming applications, contributing to a comprehensive customer view that empowers sales teams. have a strong background in Hadoop, Map Reduce, HBase, Linux, Python, Azure, Snowflake, basic knowledge on AWS and GCP and SQL, Kafka, Airflow, ETL. I also have a B.Tech degree in Electrical, Electronics and I am passionate about learning new technologies and creating impact solutions for complex business problems. I am also a content creator and a writer, sharing my knowledge and insights with over 100K+ followers.

Experience

8 yrs 4 mos

Total Experience

1 yr 9 mos

Average Tenure

2 yrs 4 mos

Current Experience

Topmate.io

Mentor - Data Engineering

Jan 2024 – Present · 2 yrs 4 mos · Remote

1. Questions in data Engineering?? -- Don't worry I've got you Covered.
2. Want to Move to Data Engineering?? -- I am here to help you
3. Any Technical Doubts?? -- Feel Free to connect Me.
I am a Data Engineer with 7 Years of Experience, Rated 4.8 Star in Topmate.

Data EngineeringData Analytics

Brillio

Lead Data Engineer

Dec 2023 – Jun 2024 · 6 mos · Bengaluru, Karnataka, India · Hybrid

Working on Largest E-commerce project, to get the competitor data, pricing information, sellers in real time, store it in databases and process it using upstream services.

Apache KafkaApache FlinkRedisDBPySparkHadoopApache Airflow+8

Pwc

2 roles

Big Data Consultant

Promoted

Jun 2022 – Apr 2024 · 1 yr 10 mos · Remote

Responsible for building scalable distributed data solutions using Spark.
Ingested log files from source servers into HDFS data lakes using Sqoop.
Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes.
Developed Spark streaming applications to ingest transactional data from Kafka topics into Cassandra tables in near real time.
Developed an spark application to flatten the transactional data coming from using various dimensional tables and persist on Cassandra tables.
Involved in developing framework for metadata management on HDFS data lakes.
Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and using right type of hive joins like Bucket Map Join and SMB join.
Worked with various files format like CSV, JSON, ORC, AVRO and Parquet.
Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive.
Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc.
Responsible for developing custom UDFs, UDAFs and UDTFs in Hive.
Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format.
Orchestrating Hadoop and Spark jobs using Oozie workflow to create dependency of jobs and run multiple Jobs in sequence for processing data.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Apache Kafkascala sparkspark sqlHadoopMicrosoft SQL ServerPython (Programming Language)+7

Big data Associate Consultant

Oct 2021 – May 2022 · 7 mos · Remote

Project Description:
Provides a 360-degree view of the customer so that a Salesperson is aware of all the facts when talking to customer. This gives a much better chance to close the deal. This involves building a data lake. Data sources use Hadoop tools to transfer data to and from HDFS and some of the sources, were imported using Sqoop, then storing the raw data into HIVE tables in ORC format in order to facilitate the data scientists to perform analytics using HIVE. New use cases were developed and dumped into a NOSQL database (HBase) for further analytics.
Environment: Cloudera CDH 5.4.4
Roles and Responsibilities:
Developed SQOOP scripts to import the source data from Oracle database into HDFS for further processing.
Developed HIVE Script to store raw data in ORC format.
Involved in gathering requirements, designing, development and testing.
Generated reports using Hive for business requirements received on ADHOC basis

HadoopMicrosoft AzurePython (Programming Language)SQLHiveApache Spark+2

Accenture

Big Data Engineer

Feb 2019 – Oct 2021 · 2 yrs 8 mos · Bangalore Urban, Karnataka, India

The Project is about to handle Risk Management Team, where the Bank wanted to store, process & manage the huge amount of data in a day to day operations, collected from various sources. The system Majority checks the credibility of the customer & looks for the credit risks.
Roles and Responsibilities :
 Ingested data from multiple sources like MySQL.
 Created and worked on Sqoop jobs with incremental load.
 Design both managed & External tables in Hive.
 Developed Spark Code in Scala using Spark SQL & Dataframes for optimization.
 Creating HBase layer for faster reporting.

HadoopHDFSMicrosoft SQL ServerApache SqoopPython (Programming Language)SQL+5

D-vois communications private limited (formerly d-vois broadband pvt. ltd.)

Big Data Engineer

Nov 2017 – Jan 2019 · 1 yr 2 mos · Bengaluru Area, India

Responsibilities:
Analyzed data using Hadoop components Hive and Pig Queries, HBase queries.
Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
Involved in loading data from the UNIX file system to HDFS.
Responsible for creating Hive tables, loading data, and writing hive queries.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce/Apache Spark, and loaded data into HDFS.
Extracted the data from Oracle Database into HDFS using the Sqoop.
Exported the patterns analyzed back to Teradata using Sqoop.
Loaded data from Web servers and Teradata using Sqoop, Spark Streaming API.
Utilized Spark Streaming API to stream data from various sources. Optimized existing Scala code and improved the cluster performance.
Experience in working with Spark applications like batch interval time, level of parallelism, memory tuning to improve the processing time and efficiency.