Rajaraman B

Software Engineer

Chennai, Tamil Nadu, India6 yrs 9 mos experience

Key Highlights

  • Over 6 years of experience in data engineering.
  • Expert in building scalable ETL/ELT pipelines.
  • Proficient in AWS and Big Data technologies.
Stackforce AI infers this person is a Data Engineer specializing in Fintech and SaaS solutions.

Contact

Skills

Core Skills

Data EngineeringEtlData ArchitectureData Migration

Other Skills

PySparkAWS GlueLambdaSNSS3Apache KafkaKinesisSparkLake FormationStep FunctionsAPI GatewayGitLabApache HiveHDFSApache Sqoop

About

I am a Results-oriented Data Engineer with over 6 years of experience in architecting and deploying high-performance data ecosystems. My expertise lies in transforming raw, complex datasets into scalable, actionable assets using the Big Data/Hadoop stack and AWS Cloud services. Throughout my career, I have specialized in building robust ETL/ELT pipelines that handle massive throughput. By leveraging Apache Spark (PySpark), Airflow, and the AWS suite (EMR, Redshift, Glue. Lambda, SQS, Athena), I focus on optimizing data workflows to reduce latency and improve reliability. Whether it’s managing NoSQL environments like HBase or fine-tuning complex SQL/PLSQL procedures, my goal is always the same: building cost-effective, future-proof data foundations. Core Technical Stack: 🔹 Cloud: AWS (S3, Lambda, EMR, ECS, Athena, Redshift, Glue, SQS) 🔹 Big Data: Apache Spark, Hadoop, Hive, HBase (NoSQL), Python, Pyspark, Databricks 🔹 Orchestration & Tools: Apache Airflow, Unix Shell Scripting 🔹 Databases: SQL, PLSQL, Data Warehousing, Oracle, Postgres I thrive on solving intricate data challenges and am passionate about staying at the forefront of cloud-native data engineering. Let’s connect to discuss data architecture, cloud optimization, or the future of ETL.

Experience

Ust

Senior Data Engineer

Nov 2024Present · 1 yr 4 mos

  • Credit Bureau & Campaign Analytics Platform
  • Role:
  • 1. Built and optimized event-driven, distributed data pipelines using PySpark, AWS Glue, Lambda, SNS, S3, and Apache Kafka/Kinesis, orchestrated via MWAA (Airflow), powering high-volume credit risk and campaign analytics for datasets exceeding 1 TB.
  • 2. Boosted ETL performance by 30% via Spark optimizations (partitioning, memory tuning, efficient joins, Parquet storage).
  • 3. Engineered streaming pipelines with Kafka producers/consumers and Kinesis Data Streams/Firehose, transforming raw microservice events into S3 data lake assets.
  • 4. Implemented schema enforcement, data validation, and quality gates using DynamoDB-driven rules for pipeline integrity.
  • 5. Deployed automated monitoring with Splunk and SQS, triggering ServiceNow incidents for failures to enhance operational reliability.
  • 6. Secured production pipelines with AWS Secrets Manager, IAM roles, and encrypted S3, adhering to enterprise standards.
  • 7. Collaborated with product/business stakeholders to deliver scalable solutions aligned with KPIs.
PySparkAWS GlueLambdaSNSS3Apache Kafka+3

Capgemini invent

Consultant

Aug 2022Oct 2024 · 2 yrs 2 mos

  • Migration of Data Lake Architecture to Data Mesh Architecture
  • Role:
  • 1. Designed distributed data platforms with Spark, AWS Glue, Lake Formation, and Step Functions, enabling Data Mesh architecture across 10+ sources for scalable marketing analytics.
  • 2. Built cross-account Spark pipelines using YAML configs, plus scalable ingestion from SAP OData/REST APIs to S3 (1M+ records daily).
  • 3. Proficient in AWS data engineering services including SNS, IAM, Lambda, Glue, S3, Athena, API Gateway, Event trigger and log management for robust data processing and management.
  • 4. Implemented and optimized CI/CD processes utilizing GitLab, AWS CLI & CDK, and CodePipeline, ensuring efficient automated testing, build, release, and deployment.
  • 5. Designed self-service data management systems, facilitating auto ingestion, data cataloguing, automated access control, and life cycle management, enhancing overall data governance.
  • 6. Optimized Spark jobs via partitioning, caching, and broadcast joins, slashing execution time by 40-50%.
  • 7. Integrated curated datasets with Athena and Redshift to power campaign analytics and reporting.
  • 8. Proficient in handling semi-structured (XML, JSON) and unstructured data, performing extraction and ingestion via REST API and ODATA APIs, enriching data ingestion capabilities.
  • 9. Collaborated in Agile environments, delivering solutions while adhering to Agile methodologies, using tools like JIRA and Service Now for project management and tracking.
SparkAWS GlueLake FormationStep FunctionsAPI GatewayGitLab+2

Deloitte

Data Engineer

Apr 2021Jul 2022 · 1 yr 3 mos

  • Integration for the Financial Transactional Project from Redshift to Oracle ADW for Oracle cloud Analytics
  • Role:-
  • 1. Built scalable big data solutions on the Apache Hadoop ecosystem using Apache Hive and HDFS for storing and querying multi-terabyte structured and semi-structured datasets.
  • 2. Implemented optimized Hive queries with partitioning, bucketing, and Parquet formats, improving query performance for large analytical workloads.
  • 3. Utilized Apache Sqoop to efficiently migrate large datasets between relational databases (Oracle, MySQL) and Hadoop HDFS, enabling seamless integration between traditional and big data platforms.
  • 4. Optimized PySpark jobs by implementing data partitioning, broadcast joins, and caching strategies, reducing job execution time by 30% for datasets over 1 TB.
  • 5. Lead the team of 3 people for Data deliverables
  • 6. Build data pipeline to migrate transactional records from Redshift to Oracle ADW
  • using GitHub, orchestrate augmented pipeline using Apache airflow,
  • 7. Data modelling and implementation of ETL code using S3 storage, Python, JSON files
  • 8. Performance tuning of the DAG and task implementation.
  • 9.Led the migration of an on-premise data system to AWS EMR, using AWS S3 as the data lake and automating job orchestration with AWS Step Functions and Airflow, reducing operational costs by 40%.
Apache HiveHDFSApache SqoopPySparkData EngineeringETL

Amdocs

2 roles

Software Engineer ( Data Platform )

Sep 2020Mar 2021 · 6 mos

  • It is data migration project where we need to migrate data from the customer’s legacy systems
  • to Amdocs System
  • Role:-
  • 1. Supporting the Development Activities for the transformation of Postpaid Customers.
  • 2. Leading the Migration Related Activities for Enterprise Customers.
  • 3. Developed Reconciliation reports genration after migration completed from legacy system
  • to Amdocs System
  • 4. Creation of stored procedures, functions and other DB objects in Oracle
  • 5. Work on enhancements and performance tuning of SQL queries
  • 6. Created Ad-hoc excel reports based on the client's requirement using Python
OracleSQLData MigrationETL

Software Developer

Mar 2019Sep 2020 · 1 yr 6 mos

  • Telecom Billing Transformation of Postpaid and Enterprise Customers: The project
  • involves around migrating all the Postpaid Customers of Vodafone across 23 Circles
  • and Enterprise Customers from BSCS System to Amdocs System
  • Role:-
  • 1. Supporting the Development Activities for the transformation of Postpaid Customers.
  • 2. Leading the Migration Related Activities for Enterprise Customers.
  • 3. Developed mappings, workflows in the informatica ETL based on the data mapping
  • 4. Creation of stored procedures, functions and other DB objects in Oracle
  • 5. Work on enhancements and performance tuning of SQL queries
  • 6. Created Ad-hoc excel reports based on the client's requirement using Python
InformaticaSQLData MigrationETL

Education

Dhirubhai Ambani University

Master's degree — Computer Science in Networking

Jan 2016Jan 2018

Birla Institute of Applied Sciences

Bachelor of Technology - BTech — Computer Engineering

Jan 2010Jan 2014

crpf public school,rohini

Jan 1997Jan 2010

Stackforce found 100+ more professionals with Data Engineering & Etl

Explore similar profiles based on matching skills and experience