AnilKumar Patel

Software Engineer

Norwalk, Connecticut, United States8 yrs 6 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

75% reduction in ETL code development with DBT Cloud.
40x performance boost migrating data to Cloud SQL Postgres & Big Query.
3-day cut in analytics processing time at Seagate Technology.

Stackforce AI infers this person is a Data Engineering expert with extensive experience in cloud migration and big data solutions.

Contact

Skills

Core Skills

Artificial Intelligence (ai)PythonAzure Data FactoryPower BiGoogle Cloud Platform (gcp)EtlBig Data

Other Skills

TableauPySparkMachine LearningSQLDATABRICKSAzure SQLMicrosoft AzureDelta tablesAzure VMDatabricks WorkflowAzure FunctionsMicrosoft Power BIComputeBlob StorageREST APIs

About

Data Engineer with over 7+ years of hands-on experience crafting and implementing distributed systems, real-time/batch big data pipelines, and orchestrating seamless cloud migrations. My expertise lies in architecting cloud solutions, providing invaluable architectural guidance, and deploying applications with a meticulous focus on high availability, cost efficiency, and disaster recovery. Proficient in a diverse set of tools and technologies, including Git, Docker, AWS, GCP, Data Warehousing, Python, ETL, and more. My Notable achievements include: ► Spearheading the implementation of DBT Cloud, resulting in a 75% reduction in ETL code development. ► Leading the migration of data to Cloud SQL Postgres & Big Query for a 40x performance boost at Priceline.com. ► Initiating a data migration strategy at Seagate Technology, resulting in a significant cut in analytics processing time by 3 days.

Experience

8 yrs 6 mos

Total Experience

1 yr 5 mos

Average Tenure

1 yr 10 mos

Current Experience

Lowe's companies, inc.

Senior Data Engineer

Aug 2024 – Present · 1 yr 10 mos · Charlotte, North Carolina, United States · Remote

Arizona state university

Graduate Teaching Assistant | IFT 511 Analyzing Big Data | Responsible for grading 100+ students

Aug 2023 – May 2024 · 9 mos · Tempe, Arizona, United States · On-site

I Facilitated significantly as an on-site educator, grading and offering constructive feedback for over 100 students, ensuring precise and punctual assignment completion with a perfect 100% rate of accomplishment.
I adeptly employed diverse quantitative metrics and harnessed Tableau to evaluate student progress, resulting in a noteworthy 20% enhancement in overall student performance.
My commitment to fostering an engaging learning environment was evident through my active involvement in student projects and discussions. This approach yielded impressive results, with a substantial 30% rise in student engagement, collaboration, and participation.

Artificial Intelligence (AI)PythonTableauPySparkMachine LearningSQL+1

Shamrock foods company

Data Engineering Intern

May 2023 – Aug 2023 · 3 mos · Arizona, United States · On-site

𝗣𝗿𝗼𝗷𝗲𝗰𝘁: 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗔𝗣𝗜 𝘄𝗶𝘁𝗵 𝗣𝗼𝘄𝗲𝗿 𝗕𝗜 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴, 𝗮𝗻𝗱 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻
Designed diverse ETL pipelines in Azure Data Factory to extract data from various Shamrock stores over the different states and ingesting and transforming into Databricks Delta Table and replacing legacy ETL at Shamrock.
Employed data transformation techniques in Databricks using spark and Delta to process and clean the extracted data, ensuring its accuracy and reliability.
Conceptualized Databricks to extract Governance data from Microsoft REST API, Microsoft Azure Active Directory API and ensuring access to pertinent information for analysis.
Constructed a comprehensive Power BI dashboard that visually presented the extracted data, empowering stakeholders to derive valuable insights and make informed decisions.
𝗣𝗿𝗼𝗷𝗲𝗰𝘁: 𝗖𝘂𝘀𝘁𝗼𝗺𝗼𝗿 𝗔𝗰𝗾𝘂𝗶𝘀𝗶𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗦𝗵𝗮𝗺𝗿𝗼𝗰𝗸 𝗙𝗼𝗼𝗱𝘀 𝗖𝗼𝗺𝗽𝗮𝗻𝘆
Influenced in the customer acquisition project at Shamrock Foods Company as an intern.
Conducted market research to support the identification of target customer segments.

Azure Data FactoryAzure SQLMicrosoft AzureDelta tablesAzure VMPython+8

Priceline

2 roles

Senior Data Engineer

Aug 2021 – Jul 2022 · 11 mos · Mumbai, Maharashtra, India

Worked as Sr. Data Engineer. Daily work involves creating ETL Jobs and using GCP Data services and specially involved in architecting the ETL tools and GCP product from design to deployment and mostly worked on migration of the databases from on-prem to cloud and working with geospatial data and leading the Arcgis geospatial product.
Worked on a generic framework that helps to compact the data in the data lake (GCS - merging the small files into properly compacted file size)
Migrated the AIRFLOW ETL jobs from on-prem to GCP Cloud Composer
Migrated the On-prem java Application to Cloud
Providing support to legacy java applications with respect to code changes.
Providing suggestions and helping Data Devops with respect to cloud services if there are any issues with scalability or failures.
Providing suggestions and guidance to various team with respect to Cloud services and how things will work in cloud if we migrate or Create Applications.

Extract, Transform, Load (ETL)Google Cloud Platform (GCP)Apache KafkaPub/SubPythonCloud Functions+13

Data Engineer

May 2020 – Jul 2021 · 1 yr 2 mos · Mumbai, Maharashtra, India

As a Data Engineer, I discuss and finalize data architecture with stakeholders, write, plan and deliver code, and do code reviews.
Responsible for designing Data Lake-house on GCP. This included the ingress and egress patterns, data security, governance, cataloging and retention policies.
Designed and developed a framework to migrate data from on-prem databases to Bigquery, Cloud SQL, and Bigtable.
Developed a booking propensity model using the booking and search data.
Worked with network team on the cloud to have initial setup for data team with respect to Data engineering services in cloud (Dataproc, dataflow, Cloud SQL , etc)
Worked with DevOps team to create CI/CD pipelines and issues with respect to scalability in the cloud.

Apache KafkaDockerBig DataPythonPySparkKubernetes+1

Mactores

Data Engineer

Dec 2018 – May 2020 · 1 yr 5 mos · Mumbai Area, India

Seagate Technology (17 Months)- Successfully orchestrated the migration of entire databases and factory files data from on-premises to an AWS S3 Data Lake using Airflow as the scheduler and Python/Pyspark for backend processing. Implemented Presto as the Query Engine for long-running queries and Spark SQL for short-running queries, resulting in a remarkable reduction of analytics processing time by 3 days.
Godrej Consumer Products (8 Months) - Successfully engineered an end-to-end ETL solution, seamlessly transitioning On-Prem data to AWS infrastructure (S3, Aurora, RDS & Athena) for Tableau Business Reports via JDBC connector. This comprehensive solution not only optimized data accessibility but also resulted in a remarkable 50% reduction in data ingestion time, enhancing overall efficiency.
Veritas (1 Month) - Successfully designed and implemented a data lake on GCP, consolidating harmonized data from diverse sources, enhancing data accessibility and insights.
Katerra (2 Months)- Collaborated with the DevOps team to deploy pipelines using automation scripts in AWS, streamlining processes and achieving a 20% improvement in deployment efficiency.
Blackhawk Network Holdings (2 month) - Orchestrated the migration of an on-prem Java application to Amazon Elastic Kubernetes Service (EKS), ensuring a seamless transition and optimizing application scalability and performance.

Extract, Transform, Load (ETL)Google Cloud Platform (GCP)Apache KafkaAmazon Web Services (AWS)Big DataTableau+2

Formcept

Big Data Developer

Jul 2017 – Dec 2018 · 1 yr 5 mos · Bengaluru Area, India

Spearheaded the development of real-time data pipelines utilizing Apache Kafka, with a primary focus on data ingestion and pre-processing, optimizing performance with AWS Cloud services (S3) resulting in a 25% improvement in data processing efficiency.
Developed MapReduce programs to parse raw data, populate staging tables, and store refined data in partitioned tables in the EDW.
Employed AWS Glue and PySpark to load data into S3 Buckets, implemented data filtering with Elasticsearch, and managed data storage in Hive external tables.
Collaborated seamlessly with front-end engineers, introducing creative ideas that led to a 15% enhancement in the overall product.
Played a key role in Agile development, providing innovative insights, and demonstrated proficiency in Apache Kafka for data consumption/publication with Avro schema. Implemented AWS services, achieving a 20% reduction in infrastructure costs.
Applied expertise in ELK and EFK stacks for log analysis and visualization, contributing to a 30% increase in actionable insights from log data.

Extract, Transform, Load (ETL)Apache KafkaAmazon Web Services (AWS)HadoopElasticSearchBig Data+3

Infrasoft technologies ltd

Data Engineer

Jun 2016 – Jun 2017 · 1 yr · Mumbai Area, India · On-site

Installed, configured, and utilized various components of the Hadoop ecosystem for data processing and storage & Developed MapReduce programs for parsing raw data and storing refined data in partitioned tables in the EDW.
Revised a Batch data pipeline using Airflow/Streamsets and Python and Apache Kafka, processing over 1 million events daily, resulting in a 30% reduction in processing time.
Participated in Data Acquisition with the Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce, and HDFS.
Engineered a solution reducing data ingestion latency by 40%, optimizing ETL processes for improved reporting accuracy and faster decision-making.
Designed and executed a star schema for a retail analytics platform, resulting in a 40% increase in query performance and managing ETL pipelines for 5+ TB of daily data from sources.

Extract, Transform, Load (ETL)Apache KafkaAmazon Web Services (AWS)Big DataTableauApache Airflow+1