Vani Athota

Data Engineer

McKinney, Texas, United States2 yrs 8 mos experience

Highly Stable

Key Highlights

Expert in building and optimizing large-scale data pipelines.
Proficient in cloud transformations and data engineering.
Strong collaboration skills with a focus on real-world problem solving.

Stackforce AI infers this person is a Data Engineering expert in the Fintech and SaaS industries.

Contact

Skills

Core Skills

Data EngineeringBig DataCloud Transformation

Other Skills

SQLHadoopApache SparkKafkaSpark StreamingStonebranchHiveSparkSQLMicrosoft AzureApache KafkaAzure Data LakeAzure DatabricksPySparkTableauSqoop

About

I’m a Senior Data Engineer with 5+ years of experience building and optimizing large-scale data pipelines. I help turn complex, messy data into reliable insights that support analytics and business decisions.I work extensively with the Big Data ecosystem—including Spark, Hive, and Kafka—and build solutions on AWS and Azure. I’ve migrated legacy MapReduce jobs to Spark, set up real-time streaming pipelines with Kinesis, and focused on making data faster, more reliable, and easier to manage.I specialize in cloud transformations, moving enterprise-scale data systems to Azure Data Lake and Databricks. I also focus on automation and Infrastructure as Code (IaC) using tools like Terraform, Stonebranch, and Python to build scalable, manageable, and efficient systems.Beyond technology, I enjoy collaborating with teams to solve real-world problems and keeping up with the latest in Big Data, cloud, and ETL trends.If you want to talk Big Data, cloud architecture, or just share insights on the latest in ETL, let’s connect!📫 Reach out here or at: vaniathota05@gmail.com

Experience

2 yrs 8 mos

Total Experience

2 yrs 8 mos

Average Tenure

Current Experience

Vns health

Data Engineer

May 2023 – Present · 3 yrs · United States · Remote

Performance Engineering: Optimized data processing by migrating legacy MapReduce jobs to Spark, significantly enhancing performance and reducing execution time.
Workflow Automation: Designed and implemented SparkSQL and Hive scripts integrated with Stonebranch for advanced job scheduling and automated workflow management.
Hive Optimization: Improved query speed and resource efficiency by implementing partitioning and bucketing strategies within Hive.
Real-time Ingestion: Built robust data pipelines using Kafka and Spark Streaming, successfully parsing XML/JSON data for immediate business analysis.
Hybrid Ingestion: Developed live data ingestion pipelines by integrating AWS Kinesis with on-prem Kafka clusters to ensure reliable data flow.

SQLHadoopApache SparkKafkaSpark StreamingStonebranch+3

Dbs bank

Data Engineer

Jun 2020 – Feb 2023 · 2 yrs 8 mos · Hyderabad · Remote

Cloud Transformation: Spearheaded the migration of large-scale projects from Cloudera Hadoop to Azure Data Lake Store as part of a major digital transformation strategy.
Cloud Architecture: Built and maintained complex environments on Azure (IAAS/PAAS), leveraging Azure Portal, PowerShell, and Storage Accounts for enterprise data management.
Streaming & Automation: Developed Autosys scripts to automate the scheduling of Kafka streaming and batch jobs, ensuring high availability of real-time data.
Data Lake Management: Utilized Azure Databricks and PySpark to perform complex business logic transformations and implement real-time data storage in HBase and Cassandra.
ETL Modernization: Designed and implemented incremental Sqoop jobs to ingest data from DB2 into Hive, enabling interactive reporting via Tableau.

Microsoft AzureApache KafkaAzure Data LakeAzure DatabricksPySparkTableau+3