Raj Purohith Arjun

           

Masters in Data Science

Texas A&M University

Department of Computer Science and Engineering

Student

Institute of Data Science




Current Research Activity:
Realtime Audio-Video Translation and Voice Cloning

Research

Objective

A robust system capable of real-time audio and video translation across multiple languages
  • To incorporate voice cloning technology for precise replication of speaker characteristics
  • To ensure seamless synchronization between translated audio and video content
  • Natural Language Processing and Deep Learning
  • To deliver high-quality, natural translations with minimal latency
  • To enhance global accessibility through effective multilingual communication solutions

About

This research is aimed at the development of a sophisticated system for real-time audio and video translation, integrated with voice cloning capabilities. By utilizing advanced machine learning techniques, the system will not only translate content but also preserve the original speaker’s vocal traits. The ultimate objective is to facilitate seamless cross-lingual communication, enabling broader access to information in sectors such as education, media, and customer service. This will contribute significantly to breaking down language barriers and fostering global connectivity

Projects

Go to my Github page to look my work and collaborate.

Prophetic Sentinel : Anticipatory Maintenance System


Crafted an anticipatory maintenance system using Python (scikit-learn, TensorFlow), SQL, and Azure leveraging machine learning to prophesy equipment failures in data centers. This proactive approach curtailed downtime and slashed maintenance expenses significantly. Garnered a striking 90% accuracy in foretelling equipment failures, leading to a notable 30% drop in maintenance costs.


AI Video summarization


Developed using Mixtral, Whisper, and AWS, integrating language models and tools like GPT-3, BERT, and FFmpeg for efficient video processing. Set up AWS EC2 instances to run large models, reduce latency, and transcribe audio to text with Whisper. Implemented dynamic quiz generation and a feedback system using Flask, HTML, and JavaScript, storing user data in a database. The project highlighted the strengths and limitations of various language models and their practical application in video summarization and interactive quizzes.


AI-driven Financial Fraud Detection Using Deep Learning and XAI


Developed an AI-based system to detect financial fraud in real-time, leveraging deep learning models like Transformers, CNNs, and GANs for fraud identification and simulation of forged transactions. Use EfficientNet for image-based fraud detection, while integrating Explainable AI (XAI) techniques to enhance model transparency and ensure ethical decision-making. Visualize fraud pa erns and risks using Tableau or Seaborn, aiding stakeholders in making informed, data-driven investment decisions. Achieved 95% accuracy in detecting fraudulent transactions, reducing false positives..


End- to-End MLOPs Pipeline for Loan Eligibility Prediction


Built MLOps pipeline for a Loan Eligibility Prediction model using Python, deployed on Google Cloud Platform (GCP). The pipeline involved creating a Flask API, containerizing it with Docker, and managing source code through Cloud Source Repository and Git.Automated deployment was handled via Cloud Build, and the model was deployed using Cloud Run. This project demonstrated efficient cloud architecture, leveraging GCP services for scalable and automated machine learning operations.


PersonaCraft: Tailored Content Recommender


Engineered a content recommendation system driven by machine learning to tailor content suggestions for individual users. This bespoke approach fostered a substantial increase in user engagement. Amplified user engagement by a commendable 25% through personalized content recommendations.


Ensemble Model for Vehicle Tracking and ANPR


Leveraging the robust capabilities of YOLOv5, YOLOv8, DeepSORT, and Easy OCR, engineered an ensemble model tailored for number plate recognition (ANPR) and vehicle tracking, particularly excelling in low light conditions. Achieving an impressive F1 Score of 0.97, this model stands as a testament to its efficacy in challenging environments. Complemented by a user-friendly web interface, it emerges as a versatile and robust solution for ANPR.




SKILLS

  • Python / SQL / R / C++
  • TensorFlow-Keras / Pytorch / spaCy / NLTK / Sci-kit
  • Seaborn / Matplotlib / Numpy / Pandas /Dplyr / OpenCV / ggplot2
  • Deep Learning / Machine learning Algorithms /Artificial Intelligence
  • Big Data / Git / Github /Apache Spark /BigML / PowerBI / YOLO
  • Data Structures/ LLM / AWS / Microsoft Azure/ Computer Vision
  • Project Management /Data Analysis/Data Pipeline/ Storytelling /Critical thinking / Problem Solving







  • Professional Experience

    Full curriculum vitae available in pdf.

    Education

    • 2024-26: Masters in Data Science , Texas A&M University
    • 2020-24: B.Tech in Computer Science Engineering with specialization
      in AI and ML , SRM University - CGPA : 9.64/10

    Internships

      Code Clause (Data Science Intern) Mumbai , India

      • Developed an AI-based video forensics method using VGG16, Long-term Recurrent Convolutional Networks (LRCN), and LSTM-RNNto detect DeepFake videos by analyzing eye blinking patterns.
      • Achieved 99% accuracy on eye-blinking detection datasets, demonstrating the model’s effectiveness in identifying manipulated videos, highlighting the lack of physiological signals like blinking.
      • Collaborated with the research team to improve digital video forensics, leveraging deep learning techniques for real-world applications of detecting synthetic media.
      • Built a Langchain-integrated Streamlit chatbot for exploratory data analysis (EDA), allowing natural language interactions with a MySQL database for automated SQL query and Python code generation.
      • Built a Langchain-integrated Streamlit chatbot for exploratory data analysis (EDA), allowing natural language interactions with a MySQL database for automated SQL query and Python code generation.
      • Streamlined data exploration processes, simplifying query generation and delivering insightful visualizations, improving overall efficiency for users conducting EDA
      • Implemented prompt engineering, memory management, and LLM performance optimization to enhance real-time data analysis and visualization capabilities.
      • Contributed to AI ethical guidelines, ensuring compliance with data ethics and privacy standards

      Open Weaver (Generative AI Intern) Chennai , India

      • Leveraged data exploration and preprocessing for sales prediction problems using Amazon Redshift and SQL, performing data cleaning, imputation, and exploratory data analysis on categorical and continuous data.
      • Developed and implemented machine learning models such as Linear Regression, Elastic Net, Random Forest, Extra Trees, Gradient Boosting, and Multi-Layer Perceptron using Python to predict sales outcomes.
      • Applied advanced techniques like Splines, MARS, and Generalized Additive Models (GAMs), while building custom Stacking Regressor and Model Blending solutions to improve sales prediction accuracy.
      • Collaborated with senior data scientists to evaluate models using R-squared metrics, conducting correlation analysis to optimize models for sales prediction and support lead generation strategies
      • Worked with Team Lead to Improve the model’s text generation capabilities, expanding its potential applications across various domains.

      Publications and Awards
      • 2024 Won Best Paper Award at International Conference on Computing Technologies for Sustainable Development-2024 for "Innovation in Vehicle Tracking : Harnessing YOLOV8 and Deep Learning Tools for Automatic Number Plate Detection" Check
      • 2023 Won Best Paper Award at the National Conference on Technology for the Society’23 for the research paper “ Enhancing ANPR using YOLOv8 and Deep Learning Techniques” held at SRMIST, Chennai Check
      • 2023 Received an Academic Award for Overall Proficiency Rank-1 in the Computer Science Department for the Year 2023 and 2024 , SRM University Check
      • 2023 Ranked in the top 10 out of 2500 participants in Proglint’s Alliance University Computer Vision Hackathon 2023.






    Certifications

    My Profile on Linkedin

  • Microsoft Certified: Azure Data Scientist Associate (certificate)
  • Hackerank Certified: SQL Advance Programmer (Verify)
  • Python for Data Science, AI & Development IBM (certificate)
  • Application of Machine Learning in Urban Studies (IIRS &ISRO) (certificate)
  • Neural Networks and Deep Learning by Andrew Ng (verify)
  • Introduction to Machine Learning by Debjani Chakraborty (NPTEL, IIT KGP) (verify)
  • Natural Language Processing by Haimanti Banerji (NPTEL, IIT KGP) (verify)



  • Contact


    Mailing address: Unit : 204 , The Villas of Cherry Hollow, 503 Cherry Street, College station, TX 77840

    E-mail: raj2001@tamu.edu or rajpurohitharjun58@gmail.com
    Linkdedin:  Reach me at Linkedin