Kumar Abhishek (Pandey)

Software Engineer

Bengaluru, Karnataka, India9 yrs 4 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

8.5+ years of experience in data engineering.
Expertise in AWS and big data technologies.
Proven ability to enhance organizational efficiency.

Stackforce AI infers this person is a Senior Data Engineer specializing in SaaS and E-commerce data solutions.

Contact

iamabhishek1995@gmail.com LinkedIn

Skills

Core Skills

Data EngineeringAmazon Web Services (aws)Hadoop

Other Skills

AWSAWS LambdaAirflowAmazon CloudWatchAmazon EC2Amazon EKSAmazon Elastic MapReduce (EMR)Amazon Relational Database Service (RDS)Amazon S3AnalyticsApache AirflowApache ImpalaApache KafkaApache OozieApache Spark

About

Results-driven Senior Data Engineer with 8.5+ years of experience designing and implementing data solutions to enhance organizational efficiency and decision-making. Proficient in data engineering, Python, AWS, Databricks, Spark, SQL, Airflow, Hadoop, Kafka, and Jenkins CI/CD pipelines, with a proven ability to develop reliable frameworks for high-quality data management. Expertise in designing scalable data architectures, optimizing data workflows, and improving data accessibility. Recognized for effective collaboration with cross-functional teams to streamline system onboarding and facilitate seamless data integration, leveraging analytics to drive business growth through innovative data strategies. - Data Engineering - Amazon AWS (PaaS, SaaS, IaaS) - Python Programming, Java, Shell/Bash - DataBricks, DeltaLake, Delta Table - Spark, PySpark, Hive, SQL, Impala, Solr, Kafka, Sqoop, HDFS, Hadoop - MySQL, PostgreSQL, Oracle, PL/SQL, SQL - Airflow, Databricks, Netflix Genie, Oziee - Docker, Kubernetes, Terraform, Jenkins - Ranger, Kereberos, CrowdStrike, JWT, OAuth - Development, DevOps, GitOps, Support - Monitoring and Logging: Kibana, SignalFx, Splunk, Cloudwatch - Git, Bitbucket, SVN, TortoiseHg - VS Code, Pycharm, Eclipse, IntelliJ, Jupyter, KubeFlow, Hyperflow, Hue, Presto, Zeppelin, Athena - Apache, Tomcat, uWSGI/Nginx - AWS, EMR, Redshift, Glue, EC2, EBS, S3, IAM, Lambda, ECR, EKS, DynamoDB, RDS, Cloudformation, CloudWatch, ELB, API Gateway, SQS, SNS, SSM, SG, VPC, etc.

Experience

9 yrs 4 mos

Total Experience

2 yrs 4 mos

Average Tenure

3 yrs 9 mos

Current Experience

Nike

Senior Data Engineer - Enterprise Data Analytics & Artificial Intelligence

Aug 2022 – Present · 3 yrs 9 mos · Bengaluru, Karnataka, India · Hybrid

Data Quality:
Customized the great-expectations library and integrated it with Nike's managed-spark framework to enable dynamic data quality rule ingestion via Excel.
Contributed to the development of Nike's proprietary Spark-expectations framework, which performs in-flight data quality checks during pipeline execution.
Infrastructure Cost & Utilization:
Designed and developed TechSolution ID, a unique identifier for data products, enabling scalable measurement of engineering metrics across tools and processes.
Telemetry:
Implemented telemetry to monitor and analyze data usage and performance metrics, providing insights into access patterns and frequency.
Nike Direct Clickstream:
Worked on processing and building foundational layer with clickstream data generated by users navigating Nike.com, Nike App & SNKRS App to derive actionable insights into user behavior, enabling business teams to optimize digital experiences, improve marketing strategies, and enhance product development.
Third-Party Partners(3PP):
Onboarded eCommerce platform data from third-party partners (3PP) operating outside the Nike ecosystem onto the Nike Data Foundation.
Ensured the availability of complete, trusted, and high-quality data to support global and APLA demand forecasting, enabling data-driven decision-making for teams across Operations.
Databricks Workflow Dependency Sensor:
Developed a custom operator to bridge Airflow DAG dependencies with Databricks Workflows, addressing challenges faced during the migration of pipelines to the new Databricks platform. This solution ensured seamless integration with downstream teams still operating on Airflow/EMR.
Nike.net Product Recommendations:
Built a foundational data layer using Adobe Clickstream data for product offerings and add-to-cart activity from Nike.net partners, enabling AI/ML teams to train models for recommendation engines, driving personalized user experiences & business growth.

Data EngineeringSoftware EngineeringAmazon Web Services (AWS)DatabricksData Analysis

Change healthcare

Senior Data Engineer

Dec 2021 – Jul 2022 · 7 mos · Bangalore Urban, Karnataka, India · Remote

Engaged in Healthcare Interoperability, enabling seamless data exchange and accurate interpretation across systems and organizational boundaries. This facilitates the efficient sharing of healthcare information with providers and individuals across various devices, such as computers, tablets, and smartphones. Additionally, it supports third-party application developers in creating medical applications that can be easily integrated into existing systems.
Contributed to the design and implementation of use cases, including external clients accessing internal data, internal users accessing external client data, and internal clients accessing internal data. Played a key role in implementing industry-standard methods for accessing integrated records, providing a standardized data access layer for client applications to read, update, and delete records, while eliminating the need for proprietary interfaces.
Tech Stack: Data Engineering, Apache Spark, Scala, Python, PySpark, AWS (Lambda, Glue, Redshift, DynamoDB, Athena), FHIR, SMILE, HL7, REST APIs, Kafka, MySQL, Hive, PostgreSQL, Jenkins, Airflow, Linux, Shell Scripting.

Data EngineeringData AnalysisAmazon Web Services (AWS)SQLREST APIs

Cloudwick technologies

Senior Software Engineer - Data & Analytics Platform

Jan 2020 – Dec 2021 · 1 yr 11 mos · Bangalore Urban, Karnataka, India · On-site

Project Description:
Worked on developing an in-house Next-Generation Data Analytics Platform (NGAP), a strategic initiative for Big Data solutions implemented using EMR and Databricks Spark Compute on the AWS Cloud. NGAP is a user-friendly, security- and privacy-compliant platform designed to support a wide range of use cases, including data storage, ingestion, and curation. It provides dedicated compute resources for diverse workloads and offers advanced features such as self-service capabilities, ad-hoc query tools, job scheduling, and more, enabling seamless and efficient data processing and analytics.
Responsibilities:
Designed and implemented a Flask-based API to enable NGAP users to debug and monitor logs through an UI.
Developed a Submission Service to manage the submission and state of applications on AWS EMR clusters and EC2 instances.
Contributed in Spark Best Practices Initiative (Managed-Spark) to help users write optimized and runtime-agnostic Spark code.
Built an AWS resource cost analysis dashboard to provide insights into cloud resource usage and expenses.
Automated IAM roles and policy management, reducing access provisioning time from 1 day to just 1 hour.
Developed various custom Airflow operators to enhance Pipeline automation, manage access, and ensure security compliance.
Provided technical guidance to cross-functional teams on implementing both streaming and batch data processing services.
Technology Stack:
Data Engineering, Python, SPark/Spark, SQL, Airflow, Docker, Kubernetes, Terraform, Jenkins, Linux, DevOps, GitOps, Netflix Genie

Data EngineeringPythonSparkSQLAirflowDocker+4

Centurylink

Software Engineer - BigData

Nov 2016 – Dec 2019 · 3 yrs 1 mo · Bengaluru Area, India · On-site

Century Link Data lake:
Century Link data lake is being created to facilitate their Data Science team to work on multiple loyalty management use cases. Data is being pulled from multiple diverse data sources (including their traditional RDBMS, SFTP, S3, and Hive Table) then dump into the staging area that is Redshift. Using Spark, we will implement business logic on staging data and end results provided to the data science team. Handling various types of file formats such as Avro, JSON, Parquet, CSV, etc.
PySpark Template - Fixed width File processing:
Designed and Worked on creating a template to process any fixed-width flat files, wrote a shell, python, and PySpark code which can be used among the team to process and ingest files to HDFS and thus create a hive table on top of that.
Tools: PySpark, Data frame, AWS, HDFS, Python Script, Linux, SQL, Shell script, Hive.
Apache SOLR:
Data from Impala was indexed using Apache SOLR and exposed over Java Jersey JAX-RS based API to the end user or consumer facing UI.
Roles & Responsibilities:
Contributed to building data pipelines, including data ingestion using SQOOP into HDFS, and transforming and analyzing data using Hive and Impala for reporting purposes.
Extracted data from various sources such as RDBMS, remote servers, HDFS, Amazon S3 buckets, and internal SFTP, and loaded it into a data lake and S3 bucket. Developed Spark jobs to process the ingested data efficiently.
Performed monitoring and root cause analysis of scheduled jobs using tools like Oozie, Agile Scheduler, and Crontab to ensure smooth operations and timely issue resolution.
Collaborated effectively with team members, leveraging strong verbal communication skills to ensure the successful execution of projects.