Alaukik Harsh

AI Researcher

Bengaluru, Karnataka, India5 yrs 7 mos experience
Most Likely To Switch

Key Highlights

  • Architected scalable data solutions at LinkedIn.
  • Reduced data latency by 90% in pipeline migrations.
  • Engineered AI tools to automate data pipeline creation.
Stackforce AI infers this person is a Data Engineering expert in SaaS with strong capabilities in real-time analytics and ETL processes.

Contact

Skills

Core Skills

Data EngineeringApache SparkReal-time AnalyticsEtl

Other Skills

ScalaData Build Tool (DBT)TrinoProgrammingExtract, Transform, Load (ETL)HdfsSQLApache PinotAzkabanDBTPythonScrumDatabricksApache KafkaPython (Programming Language)

About

Senior Data Engineer with 5+ years of experience designing large-scale data platforms, optimizing distributed pipelines, and enabling analytics, experimentation, and ML workflows. Strong expertise in Spark-based ETL, real-time analytics, data modeling, and data governance. Proven track record of reducing data latency, improving data quality, and partnering cross-functionally with analysts, data scientists, and product teams.

Experience

5 yrs 7 mos
Total Experience
1 yr 4 mos
Average Tenure
2 yrs 7 mos
Current Experience

Linkedin

Data Science Engineer

Oct 2023Present · 2 yrs 7 mos · Bengaluru · Hybrid

  • Led the architectural redesign of LinkedIn’s legacy Merlin sales analytics platform powering 4,000+ datasets & 450+ dashboards, reducing inventory size by 85% and improving reliability & maintainability.
  • Designed a scalable metric computation framework using Trino + Apache Pinot, enabling 90% faster real-time OLAP queries across 25B+ records.
  • Engineered an LLM-powered AI code generator for DBT that automates creation of production-ready data pipelines, reducing development time from 1 week to 4 hours (90% reduction) by translating natural language business logic into SQL models, configuration files, and documentation for LinkedIn's enterprise data platform.
  • Migrated fragmented Trino-based pipelines to Apache Spark, cutting pipeline runtime by 50% and reducing data latency by 90%.
  • Consolidated datasets and enforced data governance & standard metric definitions, improving data quality and eliminating duplication across business domains.Partnered with analysts and data consumers to define requirements, streamline data sourcing, and productionize analytical workflows.
  • Contributed to team growth by actively conducting technical interviews, mentoring new hires, supporting onboarding efforts and knowledge transfer on large-scale data infrastructure.
  • Tech Stack included - Scala, Python, Apache Spark, SQL, Azkaban, DBT, Hive, ETL, Trino, Apache Pinot, HDFS etc.
ScalaData Build Tool (DBT)TrinoApache SparkProgrammingExtract, Transform, Load (ETL)+5

Stockx

Data Engineer 2

May 2022Sep 2023 · 1 yr 4 mos · Bengaluru, Karnataka, India

  • Lead the data design of an end-to-end automatic A/B Testing reporting system using three industry-leading clickstream datasets into one centralized data hub, and design multiple data models to improve Search Quality at stockx, resulting in a 15% improvement of total conversion ratio at stockx platform (means 15% apx. more revenue).
  • Wrote near real time streaming data pipelines for different use cases in AWS cloud using Databricks Spark Streaming, Kafka, and Python in Databricks cluster.
  • Boosted streaming delta tables query performance by 40% via implementing multiple optimization techniques.
  • Build, develop and optimize data lake in S3 for various use cases.
  • Designed and implemented ETL pipelines from multiple sources to a centralized Datahub via Airflow.
  • Migrated more than 50 SQL procedures from Redshift to Databricks, resulting in a 70-75% improvement in overall run performance.
  • Optimized Airflow Dag to handle multiple ETLs running parallel and reduce 75% running time.
ScrumDatabricksScalaApache KafkaPython (Programming Language)Apache Spark+10

Walmart

Software Engineer II - Data

Sep 2021May 2022 · 8 mos · Bengaluru, Karnataka, India

  • - Implementing time & cost saving optimization techniques on several ETLs on a UDP based data platform.
ScrumScalaApache SparkProgrammingExtract, Transform, Load (ETL)SQL+2

Poshmark

2 roles

Software Engineer II - Data

Apr 2021Sep 2021 · 5 mos · Chennai, Tamil Nadu, India

ScrumScalaApache SparkAmazon RedshiftProgrammingExtract, Transform, Load (ETL)+3

Software Engineer I - Data

Sep 2020Apr 2021 · 7 mos · Chennai, Tamil Nadu, India

  • Implemented several Spark based ETLs using AWS Data Pipeline and EMR, scheduled using Airflow and push into Redshift
  • Tech Stack included - Redshift, AWS, Python, Spark, Airflow, EMR, Scala, Hive, ETL, etc.
  • Build an end-to-end, time-saving, fast Data Migration tool in spark. The tool migrates data from AWS S3 to Redshift.
  • Reduced execution time by 50% and improved alerting system for different edge cases.
  • Developing and following best practices relative to design, implementation, and testing.
ScrumScalaApache SparkAmazon RedshiftProgrammingExtract, Transform, Load (ETL)+3

Stealth mode startup

Data Engineering Intern

Jan 2020Jun 2020 · 5 mos · Pune, Maharashtra, India

  • Extracted historical agri-commodities data using selenium web driver in python and push into PostgreSQL.
  • Cleansed and merged the scraped data, aggregated it to make it suitable for model building.
  • Applied predictive algorithms such as multivariate time series forecasting to predict forward looking agri-commodities data using historical, geospatial data.
ProgrammingExtract, Transform, Load (ETL)SQL

Data sutram

Intern

Aug 2019Sep 2019 · 1 mo · Kolkata, West Bengal, India

ProgrammingExtract, Transform, Load (ETL)SQL

Education

Maulana Abul Kalam Azad University of Technology, West Bengal formerly WBUT

Bachelor of Technology - BTech — Computer Science

Jan 2016Jan 2020

Trident Calyx

Higher Secondary School — Maths

Jan 2013Jan 2015

Stackforce found 100+ more professionals with Data Engineering & Apache Spark

Explore similar profiles based on matching skills and experience