Abhishek Guleria

Senior Software Engineer

Bangalore Urban, Karnataka, India7 yrs 5 mos experience

Most Likely To Switch

Key Highlights

Expert in data engineering and cloud migration.
Developed audience segmentation for 100 million users.
Strong background in big data and machine learning.

Stackforce AI infers this person is a Data Engineer specializing in Big Data and Cloud Computing.

Contact

Skills

Core Skills

Data EngineeringCloud MigrationAudience SegmentationData AnalysisEtlCloud EngineeringBig Data

Other Skills

AWSAWS GlueAirflowAmazon DynamoDBAmazon RedshiftAmazon S3Amazon Web Services (AWS)Apache AirflowApache CassandraApache ImpalaApache KafkaApache OozieApache Spark StreamingApache TezAzure Databricks

Experience

7 yrs 5 mos

Total Experience

2 yrs 5 mos

Average Tenure

3 yrs 1 mo

Current Experience

Paytm

Senior Software Engineer

May 2023 – Present · 3 yrs 1 mo · Bengaluru, Karnataka, India

Zee5

Software Development Engineer-2 (Data Science & Engineering)

Nov 2021 – May 2023 · 1 yr 6 mos · Bengaluru, Karnataka, India

Internal Audience Builder for Zee Platform.
On top of the Data Lake, internal audience builder for AdTech, MarTech, and CLM, the system caters to over 100 million users. The system offers audience generation through segmentation and a rule-based engine focused on recency and frequency criteria. The system ensured fresh data availability to end systems (Ad server and CLM tool) on an hourly basis. The Audience Builder platform enables seamless audience segmentation by processing frontend-defined JSON rules, allowing for dynamic and flexible targeting based on user behavior, demographics, and engagement criteria.
User Behavior and Content Analysis Pipeline.
The objective is to build a pipeline that can analyze user behavior within the ZEE5 application. The pipeline aims to determine how long a person stays in the ZEE5 application and gather insights on the content and genre types that users are most interested in consuming. Additionally, the pipeline will identify the cast members associated with the content that users are engaging with. By collecting and analyzing this data, valuable insights can be derived regarding user preferences, viewing habits, and popular content on the ZEE5 platform.
Developed a Pipeline to Handle the Subscription Mart.
The Pipeline's objective is to extract and handle incremental subscription data before loading it into staging and master tables.
Using AWS Glue Data catalogue to store the table metadata and Hive as a warehouse and querying it with Amazon Athena.
Use Airflow to schedule the daily execution of the ETL procedure.
The dataset is utilized to create numerous Data Science use cases.
Contributed to the complete implementation of the cloud migration from AWS to GCP.
The following steps were taken during the process;
1. Assessment and Planning
2. Network and Security Setup
3. Data Migration
4. Application Migration
5. Testing and Validation
6. Cut-Over and Go-Live
7. Optimization and Post-Migration Tasks

AWS GlueAirflowGCPETLData LakeAudience Segmentation+3

Wipro

Data Engineer- Machine Learning & Big Data

Jan 2019 – Nov 2021 · 2 yrs 10 mos

As a Data Engineer, my role has revolved around extracting data from various sources and performing crucial tasks such as loading, cleansing, and transforming the data. These processes are vital to ensure that the data is accurate, consistent, and in a suitable format for analysis. Throughout my experience, I have worked with diverse data sources, including databases, APIs, and file systems, utilizing various techniques to efficiently extract and integrate data into the data processing pipeline.
One of the key areas I have focused on is the development of data in the Data Lake. This involves ingesting data from multiple sources and preparing it for analysis. The data is carefully cleansed to remove inconsistencies, errors, or duplicates, ensuring high data quality. Additionally, I have implemented transformations to create canonical models, which provide a standardized representation of the data. This standardization ensures consistency and facilitates downstream analysis.
Performance tuning of Spark jobs has been a critical aspect of my role. Processing large datasets necessitates optimizing query execution speed and efficiency. To achieve this, I have applied various optimization techniques, taking advantage of Spark's capabilities.
Implementing Data Quality checks has been another integral part of my work. Ensuring data quality is essential for reliable analysis and decision-making. To achieve this, I have performed data cleaning, manipulation, modification, and combination using a variety of steps and functions. These steps include handling missing values, detecting outliers, and validating data against predefined rules or constraints. By ensuring data integrity and reliability, I have contributed to generating accurate and trustworthy insights for analysis and decision-making purposes.

Data ExtractionData CleansingData TransformationSparkData Quality ChecksData Engineering+1