Pavitra Rai

Data Engineer

Pune, Maharashtra, India4 yrs 3 mos experience

Highly Stable

Key Highlights

Designed scalable data pipelines on AWS.
Automated ETL workflows improving data reliability.
Delivered analytics-ready datasets for decision-making.

Stackforce AI infers this person is a Data Engineer specializing in scalable cloud-based data solutions.

Contact

Skills

Core Skills

Data EngineeringBig DataData Analysis

Other Skills

AWS GlueAWS LambdaAmazon EC2Amazon Elastic MapReduce (EMR)Amazon KinesisAmazon RedshiftAmazon S3Amazon Web Services (AWS)Apache KafkaApache SparkData AnalyticsData ModelingData VisualizationData WarehousingDocker

About

I am a Data Engineer with hands-on experience designing and operating scalable, cloud-based data pipelines for large datasets. I have built and automated end you to end ETL workflows, improved data reliability, and delivered analytics-ready datasets for reporting and decision-making. My work includes developing distributed data processing pipelines using PySpark on AWS, managing Amazon S3-based data lakes, and migrating legacy data into scalable cloud architectures. I have implemented real-time ingestion and streaming systems using Apache Kafka and Amazon Kinesis, ensuring low-latency processing and operational stability. I have contributed to improving data quality through robust data modeling, automated validation, and monitoring, resulting in more accurate reporting and reduced operational issues. I hold a Bachelors of Technology degree in Biomedical Engineering and have strong proficiency in Python, SQL, and Power BI. I am comfortable owning the full data lifecycle from ingestion and transformation to delivery and focus on building reliable, production-grade data systems.

Experience

4 yrs 3 mos

Total Experience

2 yrs 4 mos

Average Tenure

Current Experience

Consultadd inc.

2 roles

Big Data Engineer

Promoted

Jul 2024 – Oct 2025 · 1 yr 3 mos

Designed and optimized large-scale batch data pipelines using PySpark on AWS Glue and EMR, reducing EMR job runtimes by 10–12% through improved joins, partitioning, and execution strategies. Automated data ingestion and pipeline orchestration using AWS Lambda and Step Functions, while migrating legacy datasets into Amazon S3 to improve pipeline reliability and performance by 10–15%. Optimized analytical workloads in Amazon Redshift by refactoring complex SQL queries and implementing Python-based data validation checks, reducing heavy query runtimes by ~20%. Integrated high-volume event data from Amazon MSK and Kinesis into downstream processing workflows to support near real-time analytics and reduce dashboard latency. Containerized data workloads using Docker and deployed them on Kubernetes (EKS), improving deployment reliability and operational consistency.

PySparkAWS GlueEMRAWS LambdaStep FunctionsAmazon S3+7

Analyst | Managment Engineer

Jan 2024 – Jul 2024 · 6 mos

Delivered data-driven insights by analyzing customer and operational datasets using Python and SQL, and built Power BI dashboards that improved workflow visibility and reduced process issue detection time by 15%.

PythonSQLPower BIData Analysis

Club kshitij

3 roles

Joint Secretary

Aug 2023 – Jul 2024 · 11 mos

Led planning and execution of large-scale cultural events and competitions, coordinating a team of 80+ members across logistics, promotions, and operations.
Drove event marketing and outreach through social media and campus campaigns, increasing inter-college participation and securing external sponsorships.