Gaurav Malhotra

Data Engineer

Delhi, India4 yrs 2 mos experience

Key Highlights

Expert in building scalable data pipelines.
Proven track record in optimizing data workflows.
Strong experience with AWS and Big Data technologies.

Stackforce AI infers this person is a Data Engineer specializing in telecommunications and cloud-based data solutions.

Contact

Skills

Core Skills

Data EngineeringAws

Other Skills

AWS AthenaAWS GlueAWS S3Amazon Web Services (AWS)Apache SparkBig DataData AnalysisData WarehousingDynamoDBExtract, Transform, Load (ETL)HadoopMySQLNoSQLPostgreSQLPySpark

About

Data Engineer with 2 years of experience who loves turning data into stories. Expertise in building high-performance, scalable data pipelines using Airflow and Big Data technologies like Hadoop, Apache Spark, Python(PySpark), Hive, SQL, NoSQL, and Sqoop, along with AWS services like EMR, RedShift, Athena, and Glue. What I Do: 1. Transform raw data into reliable, consistent datasets, enabling data-driven decisions for analytics and data science teams. 2. Build robust and scalable data workflows that handle massive volumes of data, improving operational efficiency and agility. 3. Contribute to data mart strategy and dimensional modelling. 4. Always seeking ways to optimize processes, automating tasks to drive greater efficiency and impact.

Experience

4 yrs 2 mos

Total Experience

2 yrs 5 mos

Average Tenure

1 yr 9 mos

Current Experience

Roundcircle

Data Engineer

Aug 2024 – Present · 1 yr 9 mos · Gurugram, Haryana, India · Hybrid

PySparkSQLHadoopNoSQLData WarehousingData Analysis+10

Tata consultancy services

Data Engineer

Aug 2021 – Jan 2024 · 2 yrs 5 mos · Delhi, India · Hybrid

NTLS ENGINE
● Streamlined Redshift cluster performance and implemented AWS Athena for Verizon, reducing query execution time by 10% and operational costs by 15%.
● Developed data ingestion pipelines using AWS Glue to combine data from various sources (S3, RDS, DynamoDB). Processed this data in real-time with PySpark to enhance analytics.
● Optimized Spark performance using techniques like broadcast joins, caching, and data persistence, resulting in a 15% reduction in PySpark job run time and a 20% increase in cluster resource utilization.
LIVE SIGNALING SYSTEM
● Automated ETL operations using Python, AWS Glue, PySpark, and AWS S3 to process terabytes of daily data from 5G fiber cables, enabling real-time monitoring and insights.
● Reduced network downtime by 15%, supporting Verizon's expansion to serve 20 million 5G subscribers with optimized performance.