Suraj T.

Software Engineer

Gurugram, Haryana, India9 yrs 6 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Engineered solutions processing over 10 million events daily.
  • Achieved a 40% boost in data processing efficiency.
  • Spearheaded migration of 8,000 databases to Snowflake.
Stackforce AI infers this person is a Data Engineer specializing in Cloud Computing and Data Analytics across various industries.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingData AnalyticsMachine LearningData Analysis

Other Skills

AWSAmazon Web Services (AWS)AnalyticsApache AirflowApache DagsterApache KafkaApache SparkApache Spark StreamingAzure Cosmos DBAzure Data FactoryAzure Data LakeAzure DatabricksAzure DevOpsAzure Event HubAzure Functions

About

๐Ÿš€ Suraj Tripathi - Unleashing Data Magic ๐Ÿš€ ๐Ÿ“ง Email: suraj.mnitjaipur@gmail.com ๐ŸŽ“ Education: Malaviya National Institute of Technology Bachelor of Technology (Electronics and Communication Engineering) Class of 2016 | CGPA: 7.11 ๐ŸŒŸ Skills: Python | SQL | Pyspark | Microsoft Fabric | ETL | Apache Spark Streaming | Apache Kafka | Azure Devops | Large Language Models | Apache Airflow | Machine Learning | AWS | Dataiku | Langchain | Apache Dagster | Azure | CI/CD | Synapse | Event Hub ๐ŸŒ Professional Data Sorcerer At Microsoft, I'm more than a Data Engineer; I'm a sorcerer, conjuring insights from the data realm. My wand of choice? Python, Pyspark, and Azure Synapse. Here's a glimpse of my magic: โœจ Data Streaming Maven: I engineered an event streaming solution, wielding Pyspark on Synapse, and processed over 10 million events daily from Azure Event Hub to Delta Lake. The result? A phenomenal 40% boost in data processing efficiency. โœจ Code Wizard: I coaxed Langchain and OpenAI to craft Pyspark spells from English prompts, cutting data validation time by 60%. โœจ Azure Architect: With ARM templates, I summoned Synapse workspaces, Key Vaults, and ADLS, speeding up project setup by 60%. โœจ Logging Enchanter: My spells with Azure Log Analytics made issue tracking faster than a blink, slashing incident response time by 25%. โœจ Data Alchemist: In a cross-functional alliance, we optimized data architecture, and query execution time bowed before us, decreasing by 30%. ๐ŸŒ Freelance Cloud Voyager As a Cloud Data Engineer consultant, I've roamed the digital landscapes, specializing in Amazon S3 migrations, data partitioning, ETL enhancements, and time-bending project completions. ๐ŸŒ† Former Genpact Innovator At Genpact, I orchestrated data migrations, automated ETL with Dataiku's secret potions, migrated thousands of databases to Snowflake, and conjured processing speed enhancements with Pyspark. ๐Ÿ”ฎ Amdocs Time Bender In my time at Amdocs, I mastered Telecom data, reduced runtimes by 90%, and managed the complexities of Ordering, Billing, and Migration. ๐Ÿ› ๏ธ Personal Projects - Data Artisan In my creative laboratory, I crafted a Breast Cancer Classification masterpiece on Udemy. I achieved a remarkable 98% accuracy using SVM models, adding data normalization and grid search for a touch of genius. Let's connect and weave some data magic together. Whether you seek insights or share a passion for data wizardry, I'm always up for an enchanting conversation.

Experience

9 yrs 6 mos
Total Experience
2 yrs 1 mo
Average Tenure
1 yr
Current Experience

Emids

Lead Data Engineer

Jun 2025 โ€“ Present ยท 1 yr ยท Noida, Uttar Pradesh, India ยท Hybrid

PythonPySparkAzure DatabricksSQLApache KafkaData Engineering+1

Yieldmo

Data Engineer

Jan 2024 โ€“ Mar 2025 ยท 1 yr 2 mos ยท India ยท Remote

  • โ— Designed and developed scalable data pipelines using Airflow, Snowflake, and AWS, ensuring efficient data processing for ad exchange analytics.
  • โ— Collaborated with multiple third-party vendors to generate and deliver customized datasets via S3 and email, streamlining data sharing for external partners.
  • โ— Led the integration of Liveramp data into internal pipelines, enriching customer insights and improving decision-making for targeted ad placements.
  • โ— Optimized SQL queries and Snowflake workloads, reducing query times by 15% and improving data accessibility for analytics and reporting.
  • โ— Successfully backfilled a month's worth of critical client data within a day, ensuring zero downtime and data consistency.
  • โ— Implemented complex transformations in data pipelines, adding key fields to enhance reporting accuracy and business intelligence.
  • โ— Participated in a hackathon to optimize click-through rates (CTR) using Spark ML, demonstrating a potential uplift in CTR with predictive modeling.
  • โ— Provided production support, resolving 95% of critical data pipeline incidents promptly to ensure uninterrupted operations.
Python (Programming Language)MySQLSnowflakeApache AirflowPySparkData Engineering+3

Upwork

Cloud Data Engineer Consultant

May 2023 โ€“ Nov 2023 ยท 6 mos ยท Remote

  • โ— Led the migration of disparate agricultural data from multiple sources into Trino, ensuring seamless integration and accessibility. Designed and implemented an optimal partitioning strategy, improving query performance and reducing processing time by 30% for large tables.
  • โ— Developed a robust Dagster-based orchestration pipeline to automate ETL workflows, improving data reliability and reducing manual intervention by 40%. Implemented monitoring and alerting mechanisms to enhance pipeline stability.
  • โ— Leveraged Python, SQL, and dbt to optimize data transformations and schema design, reducing data processing costs by 25%.
Amazon Web Services (AWS)Apache Dagsterdata build tool (dbt)SQLPythonData Engineering+1

Microsoft

Data Engineer

Apr 2022 โ€“ Dec 2023 ยท 1 yr 8 mos ยท Noida, Uttar Pradesh, India

  • โ— Engineered an event streaming solution using PySpark on Synapse, processing 10 million+ events per day from Azure Event Hub to Delta Lake, achieving a 40% increase in data processing efficiency.
  • โ— Innovated a data quality solution with LangChain and OpenAI, automating the generation of PySpark code from English prompts, resulting in a 60% reduction in data validation time.
  • โ— Designed and managed data pipelines in Azure Data Factory (ADF) to orchestrate data movement and transformations across cloud storage, Synapse, and Delta Lake.
  • โ— Orchestrated 15+ CI/CD pipelines via Azure DevOps, ensuring seamless deployment across development, testing, and production environments.
  • โ— Drove the creation and provisioning of essential Azure resources including Synapse workspaces, Key Vaults, and ADLS using ARM templates, contributing to a 60% faster project setup.
  • โ— Led the implementation of comprehensive logging practices using Azure Log Analytics, significantly improving resolution speed, resulting in a 25% decrease in incident response time.
Continuous Integration and Continuous Delivery (CI/CD)DevOpsAzure Cosmos DBPySparkCloud Supply Chain and ProvisioningDatabases+19

Genpact

Data Engineer

Apr 2021 โ€“ Apr 2022 ยท 1 yr ยท Bengaluru, Karnataka, India

  • โ— Spearheaded the migration of semi-structured JSON data to Snowflake, leveraging Python, SQL, and AWS S3, ensuring efficient storage and retrieval.
  • โ— Automated ETL processes and email notifications with Dataiku triggers and AWS services, resulting in a 30% reduction in processing time.
  • โ— Managed the successful migration of 8,000 databases from SQL Server to Snowflake, optimizing schema design and query performance.
  • โ— Optimized processing speed using PySpark and Snowflake by 80%.
Dataiku DSSAmazon Web Services (AWS)PySparkDatabasesAnalyticsData Ingestion+10

Ineuron.ai

Machine Learning Intern

Jan 2021 โ€“ Mar 2021 ยท 2 mos

  • Anomaly Detection :
  • Anamoly detection in the logs of the wireless transmission and idetifying the KPIs.
  • Technology : Python, Scikit Learn, MongoDB, Flask, Beautiful Soup
PythonScikit LearnMongoDBFlaskBeautiful SoupMachine Learning+1

Amdocs

Data Engineer

Aug 2016 โ€“ Apr 2021 ยท 4 yrs 8 mos ยท Gurgaon, India

  • โ— Developed Telecom business offerings, processing and analyzing data using Python, SQL, and Pandas.
  • โ— Reduced daily mapping process runtime from 20 to 2 hours using ETL optimization and automation, leading to a 90% efficiency improvement.
  • โ— Managed Ordering, Billing, and Migration activities for a user base of 100K+, leveraging SQL, Data Warehousing, and CRM systems.
  • โ— Successfully handled software defects using Agile methodologies and Jira, resolving an average of 15 issues per sprint.
Data AnalysisDatabasesAnalyticsCustomer Relationship Management (CRM)Data IngestionSQL+4

Education

Malaviya National Institute of Technology Jaipur

Bachelor of Technology (B.Tech.)

Jan 2012 โ€“ Jan 2016

Swami Harsewanand Public School

Associateโ€™s Degree โ€” Mathematics

Jan 2009 โ€“ Jan 2011

Happy Home English School

High School โ€” High School/Secondary Certificate Programs

Jan 2007 โ€“ Jan 2009

Stackforce found 100+ more professionals with Data Engineering & Cloud Computing

Explore similar profiles based on matching skills and experience