Rohit Anurag

CTO

Pune, Maharashtra, India10 yrs 3 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Led multiple Generative AI projects.
  • Improved processing times by 80% using Apache Spark.
  • Developed Life Science-compliant proprietary database.
Stackforce AI infers this person is a Generative AI and Data Engineering expert with a focus on scalable solutions.

Contact

Skills

Core Skills

Microsoft AzureLarge Language Models (llm)Python (programming Language)Project ManagementElasticsearchApache SparkC++

Other Skills

Apache KafkaAzure Kubernetes Service (AKS)CassandraChefDjangoElastic Stack (ELK)GPT-4Go (Programming Language)HTMLJavaScriptMicrosoft SQL ServerMongoDBMySQLSQLScikit-Learn

About

Currently, I lead multiple Generative AI projects at PerpetualBlock, focusing on developing a RAG-based solution for answer generation and files analysis using LLM/SLM, with integrated search capabilities using Azure AI Search and Elasticsearch. Previously, as Data Engineering Lead at PerpetualBlock, I led a team in developing automated web scraping systems and data pipelines, while fine-tuning Transformer models for entity extraction and knowledge graph creation. At Innoplexus, I led the development of a Life Science-compliant proprietary database, combining multiple open-source systems into a unified platform. Additionally, I developed Big Data platforms using Apache Spark and Kafka, improving processing times by 80%, and built search platforms with Elasticsearch, including custom ranking and query parsing algorithms.

Experience

Partex

5 roles

Principal Generative AI Engineer

Promoted

Apr 2024Present · 1 yr 11 mos

  • 1. Led Generative AI Project: Spearheaded the development of a Generative AI solution using a Retrieval-Augmented Generation (RAG) approach for efficient answer generation and file analysis.
  • 2. Search Functionality Development: Implemented advanced search capabilities using Azure AI Search and Elasticsearch, optimizing data retrieval and enhancing user experiences.
  • 3. LLM Integration: Integrated Large Language Models (LLMs) for comprehensive file analysis and solution generation, improving data processing and insights.
  • 4. Cross-Functional Team Leadership: Managed a multidisciplinary team including backend, frontend, QA, and cloud architects, ensuring seamless collaboration and project execution.
  • 5. Azure Cloud Deployment: Designed and deployed the AI solution on Azure Cloud, leveraging cloud-native tools and services for scalable and secure operations.
Microsoft AzurePython (Programming Language)ElasticsearchElastic Stack (ELK)Azure Kubernetes Service (AKS)Large Language Models (LLM)

Data Engineering Lead

Feb 2022Mar 2024 · 2 yrs 1 mo

  • 1. Embeddings using OpenAI Text Ada model on the text provided through advanced chunking strategy.
  • 2. Using ElasticSearch/ Azure AI Search as a vector DB for similarity search on embeddings.
  • 3. Tuned Hybrid search approach using Elastic BM25 algorithm + Embeddings for better results.
  • 4. Used RAG (Retrieval-Augmented Generation) based approach for creating Q&A solutions.
  • 5. Inference on OpenAI GPT-4 using Prompt Engineering through RAG based approach
  • 6. Written API/Backend in Python Flask + Langchain for enabling all the above functionalities.
Python (Programming Language)Project ManagementElasticsearchGPT-4azure cognitive searchMicrosoft SQL Server+2

Senior Member Of Technical Staff

Promoted

Mar 2019Mar 2022 · 3 yrs

  • Leading a team of 4 people into developing Innoplexus’s own Life Science’s compliant database. (C++ and Golang)
  • Rewriting 3 Open Source storage, graph and search database’s codebases into one unified database. (C++)
  • Connecting layer between them is written in Golang for achieving efficient concurrency and good performance. (Golang)
  • Own developed insertion module, aggregation module, query generation module, nodes communication module, fault tolerance module, data replication module written in C++. (C++)
  • Used Apache Spark to process the raw data from HDFS layer. Reading data from HDFS, converting into RDD format and distributing it to multiple workers for execution. (Python)
  • Reduced the processing time by 80% when deployed the data pipeline logic to Apache spark. (Python)
  • Used Apache Kafka to handle the spikes and volume of crawled data from multiple crawling nodes to HDFS layer. (Python)
  • Developed a synchronization layer between HDFS and ElasticSearch Platform. (Python)
C++Apache KafkaCassandraApache SparkPython (Programming Language)Go (Programming Language)

Member Of Technical Staff

Jun 2016Feb 2019 · 2 yrs 8 mos

  • Tokenization + Token Filterning Algorithm development to enable search logic involving Life Science’s Ontology. (Inbuilt ES Plugin)
  • Custom Ranking Function development to sort the search results on relevancy factor. (Python)
  • User query parsing algorithm for query generation. (Ngram Segmentation Logic) (Python)
  • Top Concept Word Cloud generation algorithm using built in aggregation technique of ElasticSearch (Custom Scoring logic to classify top concepts for search results set). (Inbuilt ES Plugin)
  • Built Data Ingestion Pipeline through which raw crawled data is converted into searchable format.(Python)
  • Built MongoConnector Plugin which replicates operation from MongoDB Oplog for data insertion into ElasticSearch(ES). (Python)
  • Built Transformer Module in the Data Pipeline which converts the raw data into proper format.(Python)
  • Built Shield Module which quality checks the processed data before inserting into ElasticSearch (ES).(Python)
MySQLMongoDBPython (Programming Language)Elasticsearch

Data Engineering Internship

May 2015Jul 2015 · 2 mos · Pune Area, India

  • GENERIC information Extractor USING MACHINE LEARNING TECHNIQUES.
  • Segregate attributes of a blog page like author name, comments, date, title, paragraph using a generic extractor.
  • Used PhantomJS to extract the text data over webpage with its features like text length, font-size, text.
  • Used DBSCAN clustering algorithm to cluster all texts with similar features over the blogs.
  • Manually tag them with attributes and prepare training data for new blog’s attribute prediction using RandomForest classifier.
Scikit-LearnPython (Programming Language)

Nutanix (calm.io acquired by nutanix)

Software Developer Internship

May 2014Jul 2014 · 2 mos · Bangalore

  • EXTENSION OF PYTHON REQUEST MODULE FOR ZMQ SERVER
  • Developed an adapter for communication between Python request module with ZMQ server.
  • Python Django Application creation for using this modified Request module.
  • Native C libraries of AES, SHA-1 algorithms are used for encryption and authentication over transport layer.
  • Worked with forked source code of Python Request Module.
DjangoPython (Programming Language)

Iit kanpur

2 roles

Hall Executive Council Member

Aug 2013Apr 2014 · 8 mos · Kanpur

  • Worked with a team of 10 people to ensure the smooth conduction of various Intra IIT hostel level festivals over the year.
  • Involved in managing a yearly budget of over INR 5,00,000 allocated for Hall development activities.

Fresher's Coordinator 2013

Aug 2013Aug 2013 · 0 mo · Kanpur

  • Ensured the flawless conduction of Fresher’s event for newcomers.
  • Micromanaged the whole program from sending out invites to dignitaries and preparation of stage with Sound and Light System.
  • Dynamically scheduling club performances and making necessary arrangements for club activities.

Education

Indian Institute of Technology, Kanpur

Bachelor of Technology (BTech) — Computer science and Engineering

Jan 2012Jan 2016

Chinmaya Vidyalaya, Bokaro Steel City

CBSE Board Passing certificate — Science

Jan 2010Jan 2012

Stackforce found 100+ more professionals with Microsoft Azure & Large Language Models (llm)

Explore similar profiles based on matching skills and experience