Rohit Anurag

CTO

Pune, Maharashtra, India10 yrs 3 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Led multiple Generative AI projects.
Improved processing times by 80% using Apache Spark.
Developed Life Science-compliant proprietary database.

Stackforce AI infers this person is a Generative AI and Data Engineering expert with a focus on scalable solutions.

Contact

Skills

Core Skills

Microsoft AzureLarge Language Models (llm)Python (programming Language)Project ManagementElasticsearchApache SparkC++

Other Skills

Apache KafkaAzure Kubernetes Service (AKS)CassandraChefDjangoElastic Stack (ELK)GPT-4Go (Programming Language)HTMLJavaScriptMicrosoft SQL ServerMongoDBMySQLSQLScikit-Learn

About

Currently, I lead multiple Generative AI projects at PerpetualBlock, focusing on developing a RAG-based solution for answer generation and files analysis using LLM/SLM, with integrated search capabilities using Azure AI Search and Elasticsearch. Previously, as Data Engineering Lead at PerpetualBlock, I led a team in developing automated web scraping systems and data pipelines, while fine-tuning Transformer models for entity extraction and knowledge graph creation. At Innoplexus, I led the development of a Life Science-compliant proprietary database, combining multiple open-source systems into a unified platform. Additionally, I developed Big Data platforms using Apache Spark and Kafka, improving processing times by 80%, and built search platforms with Elasticsearch, including custom ranking and query parsing algorithms.

Experience

Partex

5 roles

Principal Generative AI Engineer

Promoted

Apr 2024 – Present · 1 yr 11 mos

1. Led Generative AI Project: Spearheaded the development of a Generative AI solution using a Retrieval-Augmented Generation (RAG) approach for efficient answer generation and file analysis.
2. Search Functionality Development: Implemented advanced search capabilities using Azure AI Search and Elasticsearch, optimizing data retrieval and enhancing user experiences.
3. LLM Integration: Integrated Large Language Models (LLMs) for comprehensive file analysis and solution generation, improving data processing and insights.
4. Cross-Functional Team Leadership: Managed a multidisciplinary team including backend, frontend, QA, and cloud architects, ensuring seamless collaboration and project execution.
5. Azure Cloud Deployment: Designed and deployed the AI solution on Azure Cloud, leveraging cloud-native tools and services for scalable and secure operations.

Microsoft AzurePython (Programming Language)ElasticsearchElastic Stack (ELK)Azure Kubernetes Service (AKS)Large Language Models (LLM)

Data Engineering Lead

Feb 2022 – Mar 2024 · 2 yrs 1 mo

1. Embeddings using OpenAI Text Ada model on the text provided through advanced chunking strategy.
2. Using ElasticSearch/ Azure AI Search as a vector DB for similarity search on embeddings.
3. Tuned Hybrid search approach using Elastic BM25 algorithm + Embeddings for better results.
4. Used RAG (Retrieval-Augmented Generation) based approach for creating Q&A solutions.
5. Inference on OpenAI GPT-4 using Prompt Engineering through RAG based approach
6. Written API/Backend in Python Flask + Langchain for enabling all the above functionalities.

Python (Programming Language)Project ManagementElasticsearchGPT-4azure cognitive searchMicrosoft SQL Server+2

Senior Member Of Technical Staff

Promoted

Mar 2019 – Mar 2022 · 3 yrs

Leading a team of 4 people into developing Innoplexus’s own Life Science’s compliant database. (C++ and Golang)
Rewriting 3 Open Source storage, graph and search database’s codebases into one unified database. (C++)
Connecting layer between them is written in Golang for achieving efficient concurrency and good performance. (Golang)
Own developed insertion module, aggregation module, query generation module, nodes communication module, fault tolerance module, data replication module written in C++. (C++)
Used Apache Spark to process the raw data from HDFS layer. Reading data from HDFS, converting into RDD format and distributing it to multiple workers for execution. (Python)
Reduced the processing time by 80% when deployed the data pipeline logic to Apache spark. (Python)
Used Apache Kafka to handle the spikes and volume of crawled data from multiple crawling nodes to HDFS layer. (Python)
Developed a synchronization layer between HDFS and ElasticSearch Platform. (Python)

C++Apache KafkaCassandraApache SparkPython (Programming Language)Go (Programming Language)

Member Of Technical Staff

Jun 2016 – Feb 2019 · 2 yrs 8 mos

Tokenization + Token Filterning Algorithm development to enable search logic involving Life Science’s Ontology. (Inbuilt ES Plugin)
Custom Ranking Function development to sort the search results on relevancy factor. (Python)
User query parsing algorithm for query generation. (Ngram Segmentation Logic) (Python)
Top Concept Word Cloud generation algorithm using built in aggregation technique of ElasticSearch (Custom Scoring logic to classify top concepts for search results set). (Inbuilt ES Plugin)
Built Data Ingestion Pipeline through which raw crawled data is converted into searchable format.(Python)
Built MongoConnector Plugin which replicates operation from MongoDB Oplog for data insertion into ElasticSearch(ES). (Python)
Built Transformer Module in the Data Pipeline which converts the raw data into proper format.(Python)
Built Shield Module which quality checks the processed data before inserting into ElasticSearch (ES).(Python)

MySQLMongoDBPython (Programming Language)Elasticsearch

Data Engineering Internship

May 2015 – Jul 2015 · 2 mos · Pune Area, India

GENERIC information Extractor USING MACHINE LEARNING TECHNIQUES.
Segregate attributes of a blog page like author name, comments, date, title, paragraph using a generic extractor.
Used PhantomJS to extract the text data over webpage with its features like text length, font-size, text.
Used DBSCAN clustering algorithm to cluster all texts with similar features over the blogs.
Manually tag them with attributes and prepare training data for new blog’s attribute prediction using RandomForest classifier.

Scikit-LearnPython (Programming Language)

Nutanix (calm.io acquired by nutanix)

Software Developer Internship

May 2014 – Jul 2014 · 2 mos · Bangalore

EXTENSION OF PYTHON REQUEST MODULE FOR ZMQ SERVER
Developed an adapter for communication between Python request module with ZMQ server.
Python Django Application creation for using this modified Request module.
Native C libraries of AES, SHA-1 algorithms are used for encryption and authentication over transport layer.
Worked with forked source code of Python Request Module.

DjangoPython (Programming Language)

Iit kanpur

2 roles

Hall Executive Council Member

Aug 2013 – Apr 2014 · 8 mos · Kanpur

Worked with a team of 10 people to ensure the smooth conduction of various Intra IIT hostel level festivals over the year.
Involved in managing a yearly budget of over INR 5,00,000 allocated for Hall development activities.

Fresher's Coordinator 2013

Aug 2013 – Aug 2013 · 0 mo · Kanpur

Ensured the flawless conduction of Fresher’s event for newcomers.
Micromanaged the whole program from sending out invites to dignitaries and preparation of stage with Sound and Light System.
Dynamically scheduling club performances and making necessary arrangements for club activities.