Syed Jafar Hussain

Software Engineer

Hyderabad, Telangana, India10 yrs 5 mos experience

Highly Stable

Key Highlights

9+ years of experience in data engineering.
Expertise in building scalable data pipelines.
Proven track record in optimizing data models.

Stackforce AI infers this person is a Data Engineering expert in SaaS environments with a focus on big data solutions.

Contact

Skills

Core Skills

Data EngineeringBig DataStreaming DataMlops

Other Skills

Apache KafkaSparkSnowflakePythonAirflowApache SparkSpark StreamingApache AirflowAWS LambdaS3TerraformJavaRedisNoSQLAmazon SQS

About

Data Engineer with 9+ years of experience in building data-intensive applications, designing scalable architectures, and solving complex performance challenges. Proven expertise in developing robust data pipelines, optimizing data models, and delivering end-to-end data solutions that empower organizations with reliable, actionable insights. Skilled in bridging engineering excellence with architectural vision to ensure scalability, efficiency, and resilience across modern data platforms. Data Engineering & Architecture: Expertise in designing efficient data models, scalable ETL/ELT pipelines, and modern data warehousing solutions. Programming & Scripting: Python, Scala, SQL (expert level) Streaming & Real-Time Data: Kafka, Amazon Kinesis Big Data & Analytics Frameworks: Apache Spark, Apache Flink, Hive, Snowflake Lakehouses / Data Lakes: Delta Lake, Apache Iceberg, Hudi Cloud & Infrastructure: AWS Workflow orchestration: Apache Airflow Containerization & Orchestration: Docker, Kubernetes, Helm CI/CD Pipelines: GitHub Actions, GitLab CI, Jenkins

Experience

10 yrs 5 mos

Total Experience

2 yrs 5 mos

Average Tenure

7 mos

Current Experience

Amazon

Senior Data Engineer

Oct 2025 – Present · 7 mos · Hyderabad, Telangana, India · On-site

Salesforce

2 roles

Software Engineer MTS

Feb 2024 – Oct 2025 · 1 yr 8 mos

Data Engineer

Jul 2021 – Feb 2024 · 2 yrs 7 mos

Integrated data from diverse sources (e.g., databases, APIs, log files) into the big data ecosystem for comprehensive analysis. Created Python utility modules and functions for seamlessly integrating, sourcing, and processing data from diverse source systems, including Salesforce Objects through REST APIs and various RDBMS platforms.
Data platform and pipelines: Built distributed ETL/ELT with Spark and Snowflake (batch + streaming) across Adoption & Engagement domains; delivered schema designs enabling fast analytics and downstream ML use cases.
Streaming: Implemented Apache Kafka for real-time data integration; engineered Spark Streaming jobs with exactly-once semantics and efficient checkpointing for low-latency analytics.
MLOps & Feature Engineering: Productionized feature pipelines and re-usable transformations, enabling consistent online/offline feature parity. Integrated with model training and scoring workflows orchestrated in Airflow.
Airflow orchestration: Automated ingestion, transformation, and quality checks; implemented SLA monitoring, retry/backoff strategies, and DAG-level lineage.
Performance & cost: Tuned Spark (partitioning, broadcast joins, caching), reduced shuffles, and improved job reliability; contributed to Oracle→Snowflake migration delivering a 79% performance increase.
Automation (Jarvis): Architected a production LLM agent with Retrieval-Augmented Generation (RAG), semantic search over a vectorized knowledge base (Redis/Postgres), and text-to-semantic query execution to auto-generate insights and reports (Python, Airflow, Google APIs, Snowflake), saving $50M annually.
Kubernetes: Deployed and operated containerized data/ML services on EKS; used Helm for repeatable deployments, configured autoscaling and resource quotas for cost/perf balance.
Terraform (select workloads): Defined reproducible infra modules (networking, EKS, MSK, S3/IAM, Snowflake integrations) with remote state and workspace strategies for multi-env parity.

Apache KafkaSparkSnowflakePythonAirflowData Engineering+1

Quantium

Data Engineer

Feb 2020 – Jul 2021 · 1 yr 5 mos · Hyderabad Area, India

Major contributor in development and deployment of the Strategic Feed pipelines using Apache Airflow a Apache Spark to automate the ingestion, processing, and transformation of Woolworth's vast volumes of transactional data, including sales data, customer data, inventory data, and more for Market Read Agencies (IRI & Nielsen).
Currently overseeing the Strategic Feed ETL process, adapting it to meet evolving external client requirements.
Designed and implemented stand-alone data products (Subscription and Sales) using Apache Spark, Snowflake, Airflow and Python supporting over 250 FMCG clients and processing 20 GB of data daily. That included customised reports (Product Performance and Ranking, Sales Trends, TY vs LY, Ranged Distribution, Weighted Distribution, Store By Store) in response to FMCG client requests
Proficiently wrote Spark SQL scripts for processing large, complex datasets within a business environment, with workflow scheduling using Apache Airflow.
Optimized Apache Spark pipelines by tuning configuration parameters, using broadcast variables, and partitioning data effectively.
Implemented process improvements and re-engineering methodologies to enhance data quality.
Collaborated with stakeholders across the organization to gather requirements, design solutions, and deliver results.
Active participation in agile methodology, collaborating closely with the team to gather feedback, propose optimal solutions, and tailor applications to meet business requirements while adhering to standards.
Implemented process improvements and re-engineering methodologies to enhance data quality.

Apache AirflowApache SparkSnowflakePythonData EngineeringBig Data

Bank of america

Data Engineer

May 2017 – Feb 2020 · 2 yrs 9 mos · Hyderabad Area, India

Developed cross-platform ETL pipelines utilizing SSIS from Microsoft to ensure structured and timely access to extensive datasets for the EIT – Analytics team, primarily for compliance testing.
Proficiently authored Hive SQL scripts in a high-demand business environment, handling large-scale, complex datasets on Apache Hadoop.
Led data collection efforts across various internal systems, with the ultimate goal of facilitating data analytics.
Conducted data transformation on massive datasets (ranging from 50 to 100 million observations) using Trifacta Wrangler within Hadoop, with ETL job scheduling via Autosys.
Performed Exploratory Data Analysis (EDA) using Python to provide valuable insights for solution development.
Employed Python for automation purposes, including ad-hoc reports and specific non-routine requirements.
Designed and implemented automated tools for non-technical users, streamlining tasks such as Data Quality Checks and Sample Testing, significantly enhancing data management efficiency.

Amazon

Associate

Dec 2015 – May 2017 · 1 yr 5 mos · Hyderabad Area, India

Supported Product Category team for their data requirements related to category management.
Used Analysis Service Cube to pull data from internal source systems for product attributes and transactional data.
Used Advanced Excel to build Sales Drivers Report and Product Ranking Report for stakeholders.
Provided appropriate solutions to existing problems by close monitoring and evaluation.