Riya Chakraborty

AI Researcher

Bengaluru, Karnataka, India8 yrs 1 mo experience

Most Likely To Switch

Key Highlights

7 years of experience in end-to-end ETL data pipelines.
Expert in AWS and data engineering technologies.
Proven track record in building scalable data solutions.

Stackforce AI infers this person is a Data Engineer specializing in AWS and data pipeline development for various industries.

Contact

Skills

Core Skills

Data EngineeringAwsData Pipeline DevelopmentEtlApi Development

Other Skills

AWS AthenaAWS GlueAWS RedshiftAWS S3AirflowApache AirflowApache SparkAthenaDatabricks DataFlowDelta LakeEC2EMRExtract, Transform, Load (ETL)FlaskGlue

About

Worked with ETL Data Pipeline end to end for 7 years with Technologies like SQL, Pyspark, Kafka, Airflow, AWS, delta lake, Python, Flask, Superset, data warehousing

Experience

8 yrs 1 mo

Total Experience

1 yr 4 mos

Average Tenure

2 yrs 1 mo

Current Experience

Grab

Senior Data Engineer

Apr 2024 – Present · 2 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

Designed and developed robust data pipelines to facilitate seamless data exchange between Grab and leading digibanks across Malaysia, Singapore, and Indonesia, ensuring compliance with stringent banking regulations.
Spearheaded the integration of financial products such as credit lending for Grab drivers, MSMEs, passenger loan products, and marketing analysis, ensuring smooth data flow across various use cases.
Optimized data processing workflows using cutting-edge data engineering technologies, including Apache Spark, Presto, AWS S3, Databricks DataFlow, and Delta Lake.
Integrated and orchestrated scalable data workflows with Apache Airflow, enhancing efficiency in data processing and management.
Managed scalable and cost-effective data storage solutions on AWS S3, ensuring secure and efficient access to critical data assets.
Developed impactful dashboards and visualizations using Superset, empowering teams with actionable insights for data-driven decision-making.
Implemented Delta Lake architecture to ensure data consistency, reliability, and high-quality data management across systems.
Collaborated with cross-functional teams to ensure adherence to regulatory and compliance requirements in the financial sector, maintaining alignment with regional banking standards.

Apache SparkPrestoAWS S3Databricks DataFlowDelta LakeApache Airflow+3

Cyble inc.

Data Engineer III

Jun 2023 – Mar 2024 · 9 mos · Bengaluru, Karnataka, India · Hybrid

Projects -
Designed and created on demand Data Pipeline on AWS cloud infrastructure to process terabytes of data related to ransomware using AWS S3, Lambda, EC2, Athena with Python and Pyspark as coding languages
Key Responsibilities -
Designed and implemented an on-demand data pipeline on AWS cloud infrastructure, optimizing the processing of large volumes of ransomware-related and dark web data.
Utilized a range of AWS services including S3, Lambda, Textract, EC2, and Athena to ensure high efficiency and scalability in data processing workflows.
Leveraged Python and PySpark for data transformation, processing, and automation, ensuring seamless integration of multiple data sources.
Developed and executed strategic plans to manage team tasks, ensuring efficient and timely completion of project objectives.
Technologies Used -
AWS Athena for data preparation
Redshift Spectrum
Lambda for pipeline triggering
EC2 instance to the run the pipeline
S3 as primary storage for data
python and pyspark as coding language
Superset for dashboard

AWS S3LambdaEC2AthenaPythonPyspark+3

Zoomcar

Data Engineer

Dec 2022 – Jun 2023 · 6 mos · Bengaluru, Karnataka, India · Hybrid

Project:
Designed the Hotspot detection system for Zoomcar publicity. Based on GPS data and sensor data, created the pipeline to detect hotspots based on several prime locations.
Technologies -
SQL
Pyspark
Airflow
AWS Athena, EMR cluster, S3, Redshift, Lambda
Delta Lake

SQLPysparkAirflowAWS AthenaEMRS3+5

Inviz ai

Senior Data Engineer

Nov 2021 – Dec 2022 · 1 yr 1 mo · Bengaluru, Karnataka, India

Building intelligent search(inserach) platform for one of the ecommerce client. including better auto suggestions, appropriate product visibilities.
Responsibility:
create data pipeline for different features like ranking, auto suggestions, personalisation.
create API for personalisation feature.
Technologies used-
AWS technologies like S3, Glue, Athena, Redshift
Python as coding language with Pyspark and SQL for computation and data fetching
Airflow for orchestration
Flask for API building

AWS S3GlueAthenaRedshiftPythonPyspark+4

Amazon

Data Engineer

Apr 2020 – Oct 2021 · 1 yr 6 mos · Bengaluru, Karnataka, India

projects:
1. CPEX, Amazon packaging system, is proper system to package any item shipped without any extra material wasted or over weight issue and maintaining all the amazon guidelines for the same.
Key responsibility-
Building data pipeline for proper data flow from AWS Redshift to S3 or vice versa using ETL platform, AWS glue, python, SQL
2. Scheduler, internal tool to deschedule any ETL job not needed anymore and reschedule the same if needed after a number of times it failed.
Technologies used-
AWS Redshift, AWS Glue, Python as programming language, AWS Athena, ETL platform as orchestrator

AWS RedshiftAWS GluePythonAWS AthenaData EngineeringETL

Noodle.ai

3 roles

Data Engineer

Promoted

Oct 2019 – Apr 2020 · 6 mos

Enterprise Data Platform
Built an AI platform from scratch, to store the client data in cost efficient way with sharding the data effectively and exposing the data through several ways like API, python package.
key responsibilities-
Built the Data pipeline for migrating the data from Kafka to different data sources like SQL server, Hive, postgresql and expose the data from different sources to client dashboard using API.
technologies used-
python, kafka, SQL server, Pyspark, Flask API, Presto

PythonKafkaSQL ServerPysparkFlaskPresto+2

Associate Data Engineer

Jul 2018 – Oct 2019 · 1 yr 3 mos

Data profiler, a tool used by data science team to visualize data with all information like type of data, null value measurements, data distribution using several graphs like histogram, pie chart etc.
Technologies-
python, pyspark, SQL server, Airflow

PythonPysparkSQL ServerAirflowData Engineering