Vivek Kulkarni — CEO

Data Scientist | Machine Learning Engineer | Data Engineer | Designing Data-Intensive Architectures | Spark, AWS & Azure | Lean Six Sigma | Yellow Belt 🚀 Data Scientist and Architect with a Master’s in Data Science (USA), specializing in designing data-intensive application architectures that drive business intelligence. My expertise lies in bridging the gap between statistical theory and production-grade engineering. I focus on building end-to-end Machine Learning solutions—from exploratory data analysis and feature engineering to deploying scalable models in the cloud. Unlike traditional data scientists, I also manage the underlying infrastructure, ensuring that high-performance ETL pipelines and CI/CD workflows support robust ML operations. I love solving data challenges — from designing ETL/ELT pipelines to integrating DevSecOps and CI/CD automation — turning raw data into business insights and innovation. 💼 𝐖𝐡𝐚𝐭 𝐈 𝐃𝐨 𝐁𝐞𝐬𝐭 • Developing predictive models and algorithms to solve complex business problems. • Build & maintain robust ETL/ELT pipelines using Apache Spark, Databricks & Python 🐍 • Design data models, perform schema design & manage versioned migrations 🧱 • Clean, transform & optimize massive datasets for Analytics 📈 and Machine Learning 🤖 • Integrate solutions with AWS ☁️ and Azure 🌐 cloud ecosystems • Implement CI/CD pipelines using GitHub Actions & Infrastructure-as-Code frameworks ⚙️ • Automate data workflows via REST APIs 🔗 & event-driven jobs 🧩 🧰 𝐓𝐞𝐜𝐡 𝐓𝐨𝐨𝐥𝐛𝐨𝐱 Python 🐍 | Machine Learning & Statistics 📐 | SQL 🗃️ | NoSQL 🚀 | Apache Spark / PySpark ⚡ | Databricks 🧱 | AWS ☁️ | Azure Data Services 🌐 | GitHub CI/CD 🔁 | GitHub Actions ⚙️ | REST API 🔗 🏗️ 🔍 𝐊𝐧𝐨𝐰𝐧 𝐟𝐨𝐫 𝐛𝐞𝐢𝐧𝐠: ✅ An autodidact with endless curiosity ✅ A problem-solver who loves clean architecture 💡 ✅ Detail-oriented 👁️, results-driven 🎯 ✅ Focused on designing resilient, production-grade data backbones that power enterprise AI solutions 🎓 𝐂𝐄𝐑𝐓𝐈𝐅𝐈𝐂𝐀𝐓𝐈𝐎𝐍𝐒 🧠 AWS Certified Cloud Practitioner 📘 Microsoft Certified Azure Data Fundamentals (DP-900) 🏗️ Microsoft Certified Azure Data Engineer Associate (DP-203) 🔥 Databricks Certified Data Engineer Associate 🛡️ GitHub Foundations ⚙️ GitHub Actions 🌱 Always eager to take on challenging roles that blend Data Engineering, ML-readiness, and Cloud Innovation.

Stackforce AI infers this person is a Data Engineering and Machine Learning expert in the SaaS industry.

Location: Bengaluru, Karnataka, India

Experience: 4 yrs 10 mos

Skills

Ai Agents
Microsoft Azure
Databricks
Amazon Web Services (aws)
Machine Learning

Career Highlights

Expert in designing data-intensive architectures.
Proficient in end-to-end machine learning solutions.
Strong background in cloud technologies and data engineering.

Work Experience

Microsoft

Technical Solution Manager (7 mos)

Enterprise Minds, Inc

Senior Data Engineer (1 mo)

Cloudaeon

Senior Data Engineer (1 yr 11 mos)

Dynamon

Senior Data Engineer (1 yr)

CES

Lead Data Engineer (8 mos)

Rx Savings Solutions

Senior Data Engineer (4 mos)

United Solutions, LLC

Senior Data Scientist (4 mos)

Toyota North America

Data Scientist (2 mos)

Cisco

Data Scientist (1 yr 7 mos)

Education

Master’s Degree at Mercyhurst University

Bachelor’s Degree at Dr. Babasaheb Ambedkar Marathwada University, Aurangabad

Bachelor's degree at M.G.M's Jawaharlal Nehru College of Engineering

Vivek Kulkarni

CEO

Bengaluru, Karnataka, India4 yrs 10 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in designing data-intensive architectures.
Proficient in end-to-end machine learning solutions.
Strong background in cloud technologies and data engineering.

Stackforce AI infers this person is a Data Engineering and Machine Learning expert in the SaaS industry.

Contact

Skills

Core Skills

Ai AgentsMicrosoft AzureDatabricksAmazon Web Services (aws)Machine Learning

Other Skills

Azure AI FoundryPython (Programming Language)Lean Six SigmaContinuous Integration and Continuous Delivery (CI/CD)AI LiteracyAI for BusinessBusiness Process ImprovementRoot Cause AnalysisBusiness AnalysisMicrosoft Power BI

About

Experience

4 yrs 10 mos

Total Experience

10 mos

Average Tenure

7 mos

Current Experience

Microsoft

Technical Solution Manager

Oct 2025 – Present · 7 mos · Bengaluru, Karnataka, India · Hybrid

● Architecting data-driven solutions for the Cloud Controls and Procurement Execution (CSPE) and CMOF domains, leveraging Python, PySpark, SQL, PowerBI, and Microsoft Azure to optimize Data Centre hardware and software capabilities.
● Developing and deploying autonomous AI Agents using Microsoft Copilot Studio and Azure AI Foundry to automate complex supply chain workflows and enhance decision-making speed.
● Integrating Lean Six Sigma principles and Kaizen methodologies into engineering workflows, resulting in measurable process improvements and reduced operational latency.
● Acting as the Centre of Excellence lead for emerging AI technologies; responsible for rapid prototyping (POC), testing, and deploying scalable AI models into production Azure environments
● Designing robust ETL pipelines to synthesize multi-vertical telemetry data, creating a unified data ecosystem for advanced analytics and reporting.

AI AgentsAzure AI FoundryMicrosoft AzurePython (Programming Language)Lean Six Sigma

Enterprise minds, inc

Senior Data Engineer

Sep 2025 – Oct 2025 · 1 mo · Pune District, Maharashtra, India · Hybrid

Cloudaeon

Senior Data Engineer

Feb 2023 – Jan 2025 · 1 yr 11 mos · Pune District, Maharashtra, India · Hybrid

● Spearheaded the design and development of end-to-end ETL pipelines using Azure Data Factory (ADF) and Databricks, enabling seamless data ingestion, transformation, and integration across diverse enterprise data sources.
● Developed and maintained Python-based backend automation scripts to streamline release management workflows between SharePoint, JIRA, and Azure SQL, reducing manual interventions and improving release turnaround time.
● Extensively utilized Azure Logic Apps to orchestrate event-driven data flows and automate data validation checks, integrated with email notification systems for real-time success and failure alerts.
● Implemented DevSecOps practices by integrating Snyk, TruffleHog, and Fortify into GitHub Actions CI/CD pipelines, ensuring early detection and remediation of code vulnerabilities, and enabling automatic JIRA ticket creation for any issues identified.
● Built and deployed Data Audit and Governance frameworks leveraging GitHub REST APIs and GitHub Actions, automating metadata tracking, repository compliance, and documentation validation processes.
● Collaborated closely with cloud architects, security engineers, and product teams to enhance data quality, access control, and compliance standards across the Azure ecosystem.
● Optimized Databricks workflows by tuning cluster configurations, improving notebook runtime efficiency, and enhancing integration with ADF pipelines for better orchestration and scalability.
● Played a key role in knowledge sharing by documenting best practices and conducting internal workshops on Azure Data Engineering, CI/CD automation, and cloud security integration.

Python (Programming Language)Continuous Integration and Continuous Delivery (CI/CD)DatabricksMicrosoft Azure

Dynamon

Senior Data Engineer

Feb 2022 – Feb 2023 · 1 yr · Pune District, Maharashtra, India · Remote

● Designed and implemented an AWS Lambda–based event notification system to alert engineering and operations teams on Slack upon successful completion of ECS task executions, leveraging Python (boto3) and Slack APIs to streamline real-time communication.
● Developed and maintained containerized applications using Docker, building and deploying optimized images through GitLab CI/CD pipelines, ensuring reliability, security, and minimal downtime across multiple AWS environments.
● Architected a comprehensive infrastructure automation framework using AWS CloudFormation, writing modular JSON and YAML templates to provision and manage EC2, ECS, ECR, Lambda, and IAM resources efficiently.
● Collaborated closely with DevOps and backend engineering teams to integrate CI/CD workflows into microservices architecture, reducing manual deployments and accelerating feature delivery cycles.
● Improved infrastructure scalability and resilience by implementing auto-scaling policies, environment-based parameterization, and IAM least-privilege roles within CloudFormation stacks.
● Set up centralized logging and monitoring using CloudWatch and SNS, enabling proactive issue detection and alerting across critical data pipelines and workloads.
● Contributed to cross-functional discussions on cost optimization, introducing lifecycle rules and on-demand resource scheduling that reduced AWS costs by an estimated 20%.
● Championed documentation and knowledge-sharing initiatives to standardize infrastructure setup practices across the data engineering and DevOps teams.

Python (Programming Language)Amazon Web Services (AWS)Continuous Integration and Continuous Delivery (CI/CD)Databricks

Ces

Lead Data Engineer

Jun 2021 – Feb 2022 · 8 mos · Chennai, Tamil Nadu, India · Remote

● Led the design and implementation of robust ETL solutions for the client Benson Hill Biosystems, leveraging the AWS ecosystem to enable scalable, secure, and cost-efficient data processing workflows.
● Architected end-to-end data pipelines integrating AWS S3, Lambda, Glue, RDS, and Redshift, ensuring seamless data ingestion, transformation, and loading for analytics and reporting use cases.
● Utilized Terraform as an Infrastructure-as-Code (IaC) framework to provision, configure, and manage AWS resources — ensuring environment consistency, reusability, and faster deployment cycles.
● Developed, optimized, and orchestrated AWS Lambda and AWS Step Functions to automate ETL workflows, reduce manual intervention, and achieve near real-time data processing.
● Implemented centralized monitoring and alerting using CloudWatch, setting up custom metrics, dashboards, and alarms to proactively detect performance bottlenecks and operational issues.
● Collaborated with cross-functional data, analytics, and DevOps teams to define best practices for data governance, security, and cost optimization within AWS.
● Led peer code reviews, mentored junior engineers, and contributed to internal documentation and reusable Terraform modules to standardize infrastructure deployments across projects.
● Played a key role in migrating legacy ETL scripts to serverless architecture, improving scalability, reducing maintenance overhead, and cutting operational costs by over 30%.
● Delivered well-documented, production-grade data workflows that improved reporting accuracy and supported strategic decision-making for client stakeholders.

Python (Programming Language)Amazon Web Services (AWS)Continuous Integration and Continuous Delivery (CI/CD)Databricks

Rx savings solutions

Senior Data Engineer

Dec 2020 – Apr 2021 · 4 mos · Overland Park, Kansas, United States

● Expertly executed OCR pipeline operations on documents received from various pharmaceutical and healthcare service provider companies.
● Collaborated closely with the ETL team to ensure smooth and accurate data migration across teams.
● Explored and learnt the latest AWS technologies to provide new capabilities and increase efficiency in the ETL Pipelines.
● Stayed current with the latest AWS technologies and applied them to enhance capabilities and improve efficiency in ETL pipelines.
● Demonstrated expertise in the areas of data management, including integration, modeling, analytics, and reporting by utilizing Python, the AWS ecosystem, and big data tools.

Python (Programming Language)Amazon Web Services (AWS)Continuous Integration and Continuous Delivery (CI/CD)Databricks

United solutions, llc

Senior Data Scientist

Aug 2020 – Dec 2020 · 4 mos · Rockville, Maryland, United States

● Worked with a Federal Government client to solve large-scale contract document classification and information extraction problems using both supervised and unsupervised NLP algorithms.
● Developed a Flask-based API web application that ingests client documents and accurately classifies them into predefined categories, streamlining document management and review workflows.
● Built multiple independent Proofs of Concept (POCs) for diverse document types to automatically detect and extract key information using PyTesseract and Tesseract OCR in Python.
● Leveraged NLP libraries such as NLTK, spaCy, and FastText to perform text preprocessing, cleaning, stemming, lemmatization, and word encoding for model training and inference.
● Experimented with TF-IDF, Word2Vec, and FastText embeddings to improve text feature representation and classification accuracy.
● Collaborated with data engineers and business analysts to validate model outputs against domain rules and continuously refine accuracy through iterative feedback cycles.
● Delivered insights that helped automate manual contract review processes, reducing processing time and improving document handling efficiency for the client.

Python (Programming Language)Amazon Web Services (AWS)DatabricksMachine Learning

Toyota north america

Data Scientist

Feb 2020 – Apr 2020 · 2 mos · Dallas-Fort Worth Metroplex

● Worked for marketing team to deliver actionable insights by leveraging historical data and applying various statistical / machine learning algorithms
● Understood business requirements, tested various hypothesis about underlying mechanics of the
business processes and wrote clean, efficient code in R and Python.
● Prepared presentations and reports on findings as well as recommended next steps to ensure successful project completion.

Python (Programming Language)DatabricksMicrosoft AzureMachine Learning

Cisco

Data Scientist

Jun 2018 – Jan 2020 · 1 yr 7 mos · San Francisco Bay Area · On-site

● Collaborated with onshore (US) & offshore (Bangalore) analytics teams to design & deliver scalable machine learning solutions addressing Cisco’s enterprise-level quality and reliability challenges, ensuring alignment with global business KPIs.
● Developed and iteratively improved multiple supervised learning models — classification (Logistic Regression, Decision Tree, Random Forest, SVM, XGBoost) and regression (Linear, Ridge, Lasso, PLS, Random Forest Regressor) — driving measurable accuracy and stability improvements across sprints.
● Performed detailed validation and statistical benchmarking using Accuracy, Cohen’s Kappa, RMSE, ROC-AUC, Precision, Recall, and F1-score to ensure both analytical rigor and business relevance.
● Productionized prototype models built in Python by implementing end-to-end machine learning pipelines on distributed MapR Hadoop clusters — integrating automated data ingestion, feature engineering, model training, testing, and deployment for real-time and scheduled batch scoring.
● Created robust data preprocessing workflows including cleansing, missing value handling, outlier detection, class imbalance correction using SMOTE, and feature ranking with permutation importance and SHAP for interpretability.
● Designed model monitoring scripts and performance dashboards using Python, NumPy, and Matplotlib to track drift, stability, and retraining needs.
● Collaborated closely with DevOps teams to integrate ML jobs within enterprise data pipelines using Airflow and Jenkins for scheduling and CI/CD automation.
● Streamlined interoperability between Jupyter Notebooks and MapR HDFS through PyArrow and Hadoop client APIs, enabling efficient reading/writing of large datasets directly from the distributed storage layer.
● Contributed to documentation and internal knowledge sharing sessions focused on best practices for model deployment, versioning, and reproducibility — helping accelerate adoption of data-driven decision-making across quality teams.

Python (Programming Language)DatabricksMachine Learning