Tanuj Sharma

Product Manager

Noida, Uttar Pradesh, India7 yrs 5 mos experience

AI ML PractitionerAI Enabled

Key Highlights

8+ years of expertise in Python and data engineering.
Led successful anti-bot web scraping projects for major retailers.
Developed scalable ETL pipelines for real-time data processing.

Stackforce AI infers this person is a Data Engineering and Web Scraping expert with extensive experience in building scalable systems.

Contact

Skills

Core Skills

Web ScrapingData EngineeringBackend DevelopmentData MigrationBig Data EngineeringData Quality EngineeringAutomationWeb DevelopmentFull Stack Development

Other Skills

AWS RDSAWS S3Amazon S3Amazon Web Services (AWS)Analytical SkillsAnalyticsAngular JsAnti BotArchitectureAsyncBig DataBrowser ExtensionsC#Captcha BypassCascading Style Sheets (CSS)

About

Python Consultant with 8+ years of expertise in high-scale web scraping, anti-bot engineering, and distributed ETL/ELT systems. I architect, optimize, and lead the development of production-grade data pipelines powering mission-critical operations for global retail and analytics teams.

Experience

7 yrs 5 mos

Total Experience

1 yr 6 mos

Average Tenure

1 yr 3 mos

Current Experience

Masthead technologies pvt. ltd.

Python Consultant

Feb 2025 – Present · 1 yr 3 mos · Noida, Uttar Pradesh, India · On-site

Leading a high-performing team of 3–5 Python developers, I architect and execute enterprise-grade anti-bot web scraping solutions for top e-commerce platforms including Walmart, Sam’s Club, Kohls, Lowes, Costco, Menards, Zoro, Cymax, Lumens, Rona and 100+ others.
I specialize in bypassing advanced CAPTCHA systems such as PerimeterX Press & Hold, Cloudflare Turnstile, DataDome, and GeeTest Slide Puzzles at scale, delivering reliable and real-time data pipelines. My work combines Python, JavaScript, Golang, and Rust, leveraging browser-native automation, Dockerized distributed crawlers, and stealth scraping techniques to maintain high throughput with minimal block rates.
I also design and deploy automated monitoring systems with Slack alerts, ensuring operational efficiency and real-time visibility across data pipelines. Additionally, I develop end-to-end browser extensions and scalable scraping engines, handling thousands of SKUs daily while maintaining enterprise-level reliability.
Currently focused on building non-blocking, horizontally and vertically scalable scraping infrastructures, integrating JavaScript execution, Python orchestration, and Golang/Rust engines to push the boundaries of anti-bot and data extraction technology.

PythonJavaScriptGolangRustDockerWeb Scraping+3

Treint business

SDE2 Backend

Jun 2024 – Jan 2025 · 7 mos · Noida, Uttar Pradesh, India · On-site

Responsible for backend development, focusing on building scalable systems that integrate ETL pipelines, Gen AI models, and data storage solutions for real-time data processing and retrieval.
Developed ETL pipelines to extract approximately 500 GB of data from leading Indian online medicine retailers, utilizing AWS S3 for storage, Kafka for staging, and Elasticsearch for efficient data indexing and retrieval, enabling seamless backend integrations.
Built and maintained REST APIs that exposed transformed data from ETL pipelines, enabling the feature's functionality and improving user experience.
Worked on RAG systems POCs using Langchain, ChromaDB, and OpenAI GPT-4o-mini for testing and validation.
For Production Use, self deployed Qwen 7b/14b models and implemented the RAG system to support more paid features of the end product, enhancing its data-driven capabilities and overall value to users.
Developed inference REST APIs for Testing of open-source LLMs such as Mistral 7B, Gemma-2, and Llama 3.2 8B using Google Codespaces (cluster of 4 VMs with 8/16 GB RAM) in distributed no-GPU environments, optimizing performance and leveraging cost-effective cloud resources for testing efficiency in Medical Domain Use Cases

ETLAWS S3KafkaElasticsearchREST APIsLangchain+3

Sequentum

3 roles

Python Lead

Promoted

May 2023 – May 2024 · 1 yr

Led the migration of 1TB+ data from China Bot Legacy Systems across multiple MySQL databases to Sequentum’s RDP servers, ensuring seamless integration with MYSQL WorkBench, MSSQL Server(SSIS), optimizing for parallelism and performance.
Developed Python scripts for the migration process, improving efficiency and reducing migration time by 50% while maintaining data integrity.
Spearheaded the integration of Node.js, Python, and C# with Frida Toolkit to enable a critical bot in Sequentum Enterprise. Rooted an Android device to facilitate Frida’s seamless operation, to make Bot Work along With Sequentum Enterprise.
Led a team of 3+ Data engineers, 2 Lead Data Engineers enhancing their skills in Python and C#, which resulted in faster task delivery along with higher code quality and successful migration of C#/Python China Websites (Across Retails, Travel, Gaming, Social Media Domains) Data Scrapers to Sequentum Enterprise Product
Built Docker-based concurrent web scrapers, overcoming CAPTCHA challenges and increasing scraper throughput by 50%, improving data collection reliability.
Automated Pyspark and Pyarrow-based workflows for Parquet file handling and Snowflake data operations, reducing operational overhead by 50%.
Led Big Data transformation projects using Pyspark, Hadoop, and S3 to automate deduplication and missing data retrieval, improving data quality.
Monitored web and mobile traffic with tools like Fiddler and Burp Suite, identifying key API requests and cutting data delivery timelines by 40%.

PythonMySQLNode.jsC#DockerPyspark+3

Senior Python Developer

Promoted

Jul 2021 – May 2023 · 1 yr 10 mos

Reduced manual data quality checks by 90% by developing a Data Quality Checking Framework, automating processes like missing data identification, null value checks, null columns checks and parquet schema validation, cutting down 3-4 hours of manual work daily to 10-15 Mins automated email reports checking as First Task in Morning across multiple clients teams resulting in boosting team's work efficiency.
Migrated a key Scrapy project for Company Old Client to Sequentum Enterprise, ensuring seamless transition and functionality, boosting client satisfaction.
Developed automated scripts to extract complex tabular data from 25+ online gaming revenue websites containing Weekly, Bi-Weekly, Monthly, Quarterly, Half-Yearly, Yearly Revenue PDFs across US states, leveraging Python, Pandas, and Camelot & Tabula libraries delivering accurate client data on schedule.
Delivered Comprehensive Python training for 30+ employees, improving team skills across junior to senior levels, enhancing overall automation efficiency.
Integrated GPT4all LLM Model into Sequentum Enterprise to classify products (e.g., platinum, gold, silver etc) from extensive JSON descriptions. Developed a FastAPI-based REST API deployed across ~10 RDP servers, optimizing inference time from initial 1-2 minutes to 10-15 seconds by leveraging and debugging the threads parameter in earlier GPT4ALL versions. Enhanced scalability, concurrency, and performance with multi-threading.
Automated big data verification and cleansing using PySpark, SparkSQL, and SnowSQL, ensuring timely and accurate data delivery to clients.
Streamlined multiple system process automations (AWS S3, Snowflake Data Warehouse, Windows OS, Browser, and GUI Automation), significantly increasing operational efficiency.

PythonScrapyData Quality CheckingAutomationPysparkData Quality Engineering

Python Developer

Dec 2020 – Jul 2021 · 7 mos

Developed Captcha bypass scripts, including Cloudflare Bypass, Google Recaptcha Bypass, and Imperva, which significantly improved the company's ability to continue crucial clients data collection projects.
Led the automation of multiple business processes, enhancing overall efficiency.
Achieved a performance-based salary hike within just 4 months of service due to the successful impact on client projects and revenue growth.
Achieved SpotLight Award.

PythonCaptcha BypassAutomationWeb Scraping

Scholarsbook

Python Developer

Aug 2019 – Aug 2020 · 1 yr · Noida, Uttar Pradesh, India · On-site

Ensured high-quality data processing for learners, managing extraction, transformation, and deployment from diverse resources.
Built and maintained ETL pipelines that ingested and processed 100,000+ posts daily, accumulating over 10 million older posts from 200+ news websites, including technical news, social updates, educational content, journal articles, and research papers into AWS RDS.
Created robust web crawlers and scrapers using Test-Driven Development (TDD), achieving high efficiency and accuracy in large-scale data collection.
Designed and deployed a YouTube Videos Downloading API, utilizing 100+ login accounts to collect and categorize a variety of educational courses, facilitating seamless content delivery.
Extensive experience in AWS EC2 for deploying, debugging, and troubleshooting large-scale data ingestion workflows, with data stored in JSON format.
Designed and optimized ETL pipelines, enabling 30% faster daily ingestion of large datasets into cloud databases.
Developed REST APIs using Flask and SQLAlchemy ORM, ensuring seamless integration with internal and external systems.
Implemented sophisticated scraping strategies to bypass anti-bot detection systems of platforms like Facebook, LinkedIn, and Instagram, achieving a 60% increase in data collection success rates.
Hands-on experience with Apache Kafka, Spark, and Airflow, enhancing capabilities in big data processing and workflow automation.

PythonETLAWS RDSWeb CrawlersData EngineeringWeb Development

Empirical solutions

Software Engineer

Mar 2018 – May 2019 · 1 yr 2 mos · Noida, Uttar Pradesh, India · On-site

Collaborated on small-scale full-stack projects, gaining practical experience in React.js/AngularJS, Python backend development, and MongoDB.
Successfully designed and implemented features like CRUD operations, RESTful APIs, and responsive UI components for a Task Management Web app and an E-Commerce platform.
Leveraged the opportunity to learn industry best practices, API integration, and database optimization while working with minimal resources and tight timelines.

React.jsPythonMongoDBFull Stack Development