Suraj Mishra — Operations Associate

As a passionate Data Engineer with over 6 years of experience with certification in Databricks professional, Associate, Microsoft fabric , I specialize in developing and implementing end-to-end data solutions that drive business insights and optimize operations. My expertise lies in building robust data pipelines, processing large datasets, and leveraging cutting-edge technologies to provide actionable insights. I am well-versed in programming languages, algorithms, and a variety of data engineering tools, always eager to learn and apply new concepts. Core Competencies & Experience: Data Pipeline Development: Expertise in creating end-to-end data pipelines, ensuring stakeholders have seamless access to visualized, actionable data for better decision-making. Cloud & Data Engineering: Proficient in using Azure Data Factory, Databricks, Azure SQL, Azure Data Lake Storage, spark declarative pipeline and other Azure services to preprocess raw data, transform it into its purest form, and deliver high-quality data marts and data hubs for multiple clients. Data Warehouse Architecture: Architecting scalable and efficient data warehouses for historical data visualization, leveraging Azure databricks for optimal storage and management of large datasets. Product Development & Integration: Led the development of a data staging product that supports 35+ clients in managing their data. Integrated this product with CRM platforms like Salesforce Service Cloud, Salesforce, Snowflake , file based system, Veeva CRM and other data warehouses. Pharma Data Expertise: Managed and processed complex pharma datasets, patients transaction and claims including Sales Data, CRM Data, and Ad-hoc Data for over 20+ clients, ensuring seamless data flow and delivery of insights. Advanced Analytics & AI/ML: Strong background in Data Analytics and AI/ML techniques for both structured and unstructured data, driving actionable insights that contribute to business. Batch Pipeline Orchestration: Skilled in creating batch pipelines using Apache Airflow, ensuring smooth, automated workflows for large-scale data processing. CI/CD for Data Pipelines & Models: Expertise in building CI/CD pipelines to automate the deployment of data pipelines and data models, ensuring reliable and efficient data operations. Testing & Data Quality: Experienced in creating testing datasets for DataFactory pipelines, pytest for function and Azure Data Models to ensure data integrity and high-quality outputs.

Stackforce AI infers this person is a Data Engineering expert in Healthcare analytics with a focus on scalable data solutions.

Location: Gurugram, Haryana, India

Experience: 6 yrs 5 mos

Skills

Data Pipeline Development
Cloud & Data Engineering

Career Highlights

Led a team to deliver projects ahead of schedule.
Refactored legacy systems to improve efficiency.
Expert in building scalable data pipelines.

Work Experience

EXL

Analytics Manager (1 yr 6 mos)

Syneos Health

Sr IT data analyst (1 yr 3 mos)

IT Data Analyst (2 yrs)

Tata Consultancy Services

System Engineer (2 mos)

Assistant System Engineer (2 yrs)

Education

Bachelor of Engineering at Army institute of technology

Suraj Mishra

Operations Associate

Gurugram, Haryana, India6 yrs 5 mos experience

AI Enabled

Key Highlights

Led a team to deliver projects ahead of schedule.
Refactored legacy systems to improve efficiency.
Expert in building scalable data pipelines.

Stackforce AI infers this person is a Data Engineering expert in Healthcare analytics with a focus on scalable data solutions.

Contact

Skills

Core Skills

Data Pipeline DevelopmentCloud & Data Engineering

Other Skills

Azure DatabricksPySparkSQL ServerMicrosoft SQL ServerMaster Data ManagementSQLData ModelingApache Spark StreamingAzure Data FactoryAzure Data LakeAzure SQLVeeva CRMSalesforce.comData ArchitectsData Architecture

About

Experience

6 yrs 5 mos

Total Experience

2 yrs 5 mos

Average Tenure

1 yr 6 mos

Current Experience

Exl

Analytics Manager

Oct 2024 – Present · 1 yr 6 mos · Noida · Hybrid

Led end-to-end stakeholder communication as Manager across client technical, QA, and business teams to gather requirements, define success metrics, and finalize a unified EMR pipeline design-reducing clarification cycles by 60%.
Authored comprehensive technical and business requirement documents from scratch in the absence of a data dictionary, helping accelerate schema discovery for 200+ tables across 2 EMRs.
Refactored a 10-year-old legacy SQL Server and SSIS-based pipeline (taking 4–6 hours daily) into a scalable metadata-driven PySpark pipeline in Azure Databricks, cutting daily run time to under 90 minutes.
Built a medallion architecture (bronze, silver, gold) with schema evolution to ingest data from Snowflake and ODBC into ADLS, achieving 99.9% schema consistency across EMRs using automated metadata control tables.
Designed and implemented direct upsert from Databricks to SQL Server using custom APIs-an uncommon and highly efficient approach-eliminating the need for intermediate staging steps and reducing data delivery lag by 50%.
Spearheaded a team of 5 data engineers through the full SDLC-architecture design, sprint planning, metadata modeling, development, and go-live-delivering the project 2 weeks ahead of schedule.
Trained and mentored the team in PySpark, Delta Lake, and ADF orchestration, resulting in a 40% improvement in story completion velocity across sprints.
Established a reusable, client-agnostic pipeline template supporting future EMR integrations, positioning the solution for enterprise-wide adoption with minimal rework.
Optimized data union and transformation logic in the silver layer to standardize across EMRs, reducing downstream complexity for reporting teams and enabling a 2x faster data onboarding process for any new EMRs.
Minimized pipeline failures by implementing robust logging, error-handling, and retry mechanisms, increasing overall pipeline reliability to 98.5%.