Vaishnav Kumar

Data Engineer

Pittston, Pennsylvania, United States1 yr 5 mos experience

Key Highlights

5+ years of experience in data engineering.
Expert in building efficient data pipelines.
Strong background in both AWS and Azure cloud platforms.

Stackforce AI infers this person is a Data Engineer with expertise in cloud-based data solutions and ETL processes.

Contact

Skills

Core Skills

Data EngineeringEtl ProcessesData AnalyticsCloud Engineering

Other Skills

AWS EC2AWS GlueAWS LambdaAmazon Elastic MapReduce (EMR)Amazon RedshiftAmazon VPCAmazon Web Services (AWS)ApacheApache AirflowApache KafkaApache SparkApache Spark StreamingAzure Data FactoryAzure Data LakeAzure Databricks

About

5+ years of expertise designing, developing, and executing data pipelines and data lake requirements in numerous companies using the Big Data Technology stack, Python, PL/SQL, SQL, REST APIs, and the Azure cloud platform. Experience in using AWS Cloud Formation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS. Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses. Strong Experience using AWS and Azure Data services and creating the most Efficient Data Pipelines using optimized Solutions. Experience using various Hadoop Distributions (Cloudera, Map R, Hortonworks, and Azure. to fully implement and leverage new Hadoop features. Expert in creating various Kafka producers and consumers for seamless data streaming with AWS services. Hands-on experience with Spark, Data bricks, and Delta Lake. Well experienced in deployment & configuration management and Virtualization. Capable of understanding and knowledge of job workflow scheduling and locking tools/services like Oozie, Zookeeper, Airflow and Apache NiFi. Have Extensive Experience in IT data analytics projects, Hands-on experience in migrating on premise ETs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, and Composer. Hands-on experience working with DevOps tools such as Jenkins, Docker, and Kubernetes, Gocd, and Autosys scheduler. Hands-on experience interacting with REST APIs developed using the micro-services architecture for retrieving data from different sources. Hands-on experience in implementing, Building, and Deployment of CI/CD pipelines, managing projects often including tracking multiple deployments across multiple pipeline stages (Dev, Test/QA staging, and production).

Experience

1 yr 5 mos

Total Experience

8 mos

Average Tenure

Current Experience

Cisco systems

Azure Data Engineer

Feb 2025 – Aug 2025 · 6 mos · Pennsylvania, United States · Hybrid

Designed and developed performant ETL pipelines using Apache Spark (PySpark API).
Created Hive tables with dynamic partitions and buckets, loading formats like Avro and Parquet, and querying with HiveQL.
Built pipelines to move data from Azure Data Lake into staging SQL DB and then Azure SQL DB.
Developed Spark Streaming apps to consume Kafka topics and write processed streams to HBase.
Built pipelines, data flows, and complex transformations using Azure Data Factory, PySpark, and Databricks. Optimized Spark jobs for batch interval, parallelism, and memory usage.
Enhanced browser apps with MVC using AngularJS.
Created PyQt-based data tables for displaying and managing customer and policy records.
Expert in BI and data visualization with Tableau, connecting to various sources and building interactive dashboards.
Developed T-SQL scripts for managing database objects and improving performance.
Used Elasticsearch and Kibana to index and visualize real-time analytics, delivering fast, actionable insights.
Contributed to all phases of the SDLC, from requirements gathering through design, development, deployment, and analysis.
Designed data integration solutions with Azure Data Factory for moving data between on-premises and cloud systems.
Configured Jenkins for CI/CD, including building and managing Jenkins slaves for parallel execution.
Managed data migration projects with MongoDB, ensuring data integrity during import/export.
Implemented Kubernetes namespaces and RBAC policies for secure access controls in data infrastructure.
Developed Kibana dashboards from Logstash data and integrated varied sources into Elasticsearch for near-real-time log analysis.
Integrated Synapse with Azure Databricks notebooks to reduce development time by ~50%.
Developed and maintained Snowflake data models, including tables, views, and materialized views.
Utilized Docker to create application images and dynamically provision Jenkins CI/CD pipeline slaves.

Apache SparkHiveQLAzure Data FactoryPySparkTableauElasticsearch+6

Lincoln financial group trust co inc

AWS Data Engineer

Mar 2024 – Feb 2025 · 11 mos · Radnor, Pennsylvania, United States · Hybrid

Designed and developed real-time ETL pipelines using Kubernetes and Docker for data integration and transformation.
Built AWS Lambda functions with IAM roles and CloudWatch triggers to schedule Python scripts supporting SQS, SNS, EventBridge, and Step Functions for automating SageMaker tasks (data publishing to S3, model training, deployment).
Created Hive and HBase tables with ORC format and Snappy compression; integrated HBase with Hive for optimized querying.
Developed Spark-Scala data pipelines for aggregations, JSON formatting, and visualization output.
Wrote Kafka producers to stream data from REST APIs to Kafka topics.
Managed AWS EC2 high availability, migrated legacy systems to AWS, and created Terraform modules for infrastructure automation.
Loaded data into Cassandra NoSQL DB, ensured data integrity during migration, resolving compatibility issues with T-SQL.
Improved CI/CD workflows using Jenkins and Docker.
Conducted Kubernetes and Docker performance tuning, monitored systems with Nagios, CloudWatch, and ELK stack (Elasticsearch, Logstash, Kibana).
Deployed proof of concept in AWS S3 and Snowflake.

AWS LambdaKubernetesDockerSpark-ScalaKafkaAWS EC2+3

Nimble international

GCP Data Engineer

Jan 2022 – Dec 2023 · 1 yr 11 mos · Hyderabad, Telangana, India

Involved in loading data from UNIX file system to HDFS. Utilized Azure Logic Apps to build workflows to schedule and automate batch
jobs by integrating apps, ADF pipelines, and other services like HTTP requests, email triggers etc.
Experienced in GCP features which include Google Compute engine, Google Storage, VPC, Cloud Load balancing, IAM.
Created Data bricks Job workflows which extracts data from SQL Server and upload the files to SFTP using PySpark and Python.
Experienced in working Services like Data Lake, Data Lake Analytics, SQL Database, Synapse, Data Bricks, Data factory, Logic Apps
and SQL Data warehouse and GCP services Like Big Query, Dataproc, Pub sub etc.
Developed and maintained a real-time data ingestion pipeline using Java/Scala and Apache Kafka, enabling near-real-time analysis of
critical business metrics. Working with GCP cloud using in GCP Cloud storage, Data-proc, Data Flow, Big-Query, EMR, S3, Glacier
and EC2 with EMR Cluster. Involved in setting up of Apache Airflow service in GCP.
Developed Python Spark modules for Data ingestion & analytics loading from Parquet, Avro, JSON data and from database tables.
Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S.
Well versed with various aspects of ETL processes used in loading and updating Oracle data warehouse.
Implemented Synapse Integration with Azure Data bricks notebooks which reduce about half of development work. And achieved
performance improvement on Synapse loading by implementing a dynamic partition switch.
Monitoring Big Query, Dataproc and Cloud Dataflow jobs via Stack driver for all the different environments.
Process and load bound and unbound Data from Google pub/sub topic to Big Query using Cloud Dataflow with Python.

GCPApache KafkaData LakeData FactoryBigQueryData Proc+2

Hitachi vantara

Data Engineer

Jun 2020 – Dec 2021 · 1 yr 6 mos · Hyderabad, Telangana, India

Designed and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Developing data pipelines and workflows using Azure Data bricks to process and transform large volumes of data, utilizing
programming languages such as Python, Scala, or SQL.
Performed ETL operations using Python, Spark SQL, S3 and Redshift on terabytes of data to obtain customer insights.
Used Hive to analyse data ingested into H Base by using Hive-H Base integration and compute various metrics for reporting on the
dashboard.
Provided high availability for IaaS VMs and PaaS role instances for access from other services in the V Net with Azure Internal Load
Balancer. Working knowledge on Kubernetes to deploy scale, Load balance, and manage Docker containers and Open Shift with
multiple namespace versions.
Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming and did data quality checks
using Spark Streaming and arranged bad and passable flags on the data. Developed business logic using Kafka & Spark Streaming and
implemented business transformations. Supported Continuous storage in ADLS and configured Snapshots and wrote entities in spark
along with named queries to interact with database.
Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file
formats for analysing & transforming the data to uncover insights into the customer usage patterns.
Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
Design and configure database, Back-end applications and programs. Managed large datasets using Pandas data frames and SQL.
Used Azure Data Factory to ingest data from log files and business custom applications, processed data on Data bricks per day-to-day
requirements, and loaded them to Azure Data Lakes.