Harsh Tripathi

Data Engineer

Bengaluru, Karnataka, India4 yrs 9 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Architected scalable ETL systems for high data loads.
Achieved 40% load reduction through Big Data optimization.
Automated defect resolution, reducing manual work by 60%.

Stackforce AI infers this person is a SaaS Data Engineer with expertise in Big Data and AI-driven solutions.

Contact

Skills

Core Skills

Data EngineeringBig Data

Other Skills

Databricks ProductsAIPostgreSQLAzure Kubernetes ServiceApache ParquetOpenSearchSparkAzure FunctionsApache SparkAzure Data FactoryDockerPythonAirflowPL/SQLData Pipelines

About

Data Scientist/Engineer with 3+ years of experience in building scalable ETL systems and deploying ML models. Proven expertise in Big Data optimization (40% load reduction) and automating defect resolution (60% manual work reduction). Let's connect!

Experience

4 yrs 9 mos

Total Experience

1 yr 7 mos

Average Tenure

1 yr 8 mos

Current Experience

Mercedes-benz research and development india

3 roles

Senior Data Scientist

Promoted

Oct 2025 – Present · 8 mos

Architected and Created Databricks ConnectorHT, a VS Code extension for remote cluster execution, leveraging AI-assisted development tools to rapidly build and deploy the internal open-source solution.
Authored two technical patents: a fully filed AI-integrated Big Data architecture for massive-scale processing (Patent Pending), and an automated testing framework for Data Engineering pipelines (filing in process).
Expanded DIANA’s capabilities by integrating specialized AI Insights Agents (Summarization, Clustering) to process massive datasets, transitioning the business from static, predefined dashboards to dynamic, real-time data discovery that automatically uncovers hidden correlations.
Led the creation of DIANA (Data Intelligence and Analytics Natural Assistant), empowering non-technical stakeholders to execute natural language database queries and on-the-fly KPI transformations, effectively eliminating manual data engineering bottlenecks and slashing cycle times from weeks to hours.
Led the end-to-end architecture and development of a standardized, scalable stream data solution for vehicle EEL (Executable and Error Logging )logs, processing 10M+ events daily and translating complex business logic into real-time deliverables for key stakeholders.
Architected a high-throughput data extraction pipeline utilizing fully distributed Databricks clusters, effectively managing massive I/O loads to orchestrate bulk data writes to PostgreSQL with zero performance bottlenecking.

Databricks ProductsAIBig DataData EngineeringPostgreSQL

Data Engineer/Scientist

Oct 2024 – Oct 2025 · 1 yr

Deployed automated defect triage code on Azure Kubernetes Service, optimizing resource allocation and reducing server load by 40%.
Engineered a Big Data pipeline converting 2M+ daily DLT binary logs to human-readable format using Apache Parquet optimizations.
Developed custom metric visualizations for DLT logs in OpenSearch/ELK stack, reducing mean-time-to-diagnose by 25%.
Built end-to-end ETL framework integrating multiple data sources authentication, achieving 5x faster ingestion via Spark optimizations and delivering unified insights through curated data layers.

Azure Kubernetes ServiceApache ParquetOpenSearchSparkData EngineeringBig Data

Data Scientist Trainee

Feb 2024 – Sep 2024 · 7 mos

Developed and implemented a defect triage model, automating the manual ticket tagging process to route issues directly to the appropriate developer.
Analyzed large DLT log files, achieving a 99% reduction in log line volume through advanced filtering and aggregation techniques.
Applied AI/ML techniques to intelligently tag tickets by analyzing parameters such as descriptions and DLT log lines, improving issue categorization accuracy.
Reduced ticket reassignment ("hip hopping") by approximately 90%, leading to a significant decrease in estimated resolution time.
Deployed an intelligent, description-based issue tracking system using Azure Functions for efficient issue management and routing.
Automated the entire process of triggering Azure Functions for description summarization and DLT log line reduction/analysis using a CI/CD pipeline.

AIAzure FunctionsData Engineering

Capgemini

Data Consultant

Jul 2021 – Jul 2022 · 1 yr · Bengaluru South, Karnataka, India · On-site

Successfully migrated jobs/tasks from Azure Data Factory to Data bricks and wrote code for all ETL tasks.
Led a B2B team in completing a migration with thorough testing.
Achieved seamless transition and improved efficiency in data processing for Capgemini.

Databricks ProductsApache SparkData Engineering

Infosys

2 roles

System Engineer

Jun 2019 – Jul 2021 · 2 yrs 1 mo

Leveraged expertise in Docker, Python, Airflow, Databricks, and PL/SQL to optimize data workflows within the Media Team at Infosys.
Developed a Docker-based ETL pipeline in Python for automated data ingestion from AWS, streamlining data processing.
Designed custom Airflow tasks to enhance workflow orchestration in Databricks, improving efficiency.
Automated data loading processes with a PL/SQL program, reducing manual effort and increasing accuracy.

DockerPythonAirflowPL/SQLData Engineering