![]() |
            |
Masters in Data ScienceTexas A&M UniversityDepartment of Computer Science and EngineeringGraduate StudentInstitute of Data Science |
---|
CURRENT PROJECT
Go to my Github page to explore my work and collaborate.
developing an AI-powered Workplace Investigation Tool that helps managers conduct workplace investigations with consistency, compliance, and legal defensibility. The project involves fine-tuning legal-domain LLMs such as SaulLM7B and BERT, and building a Retrieval-Augmented Generation (RAG) pipeline with FAISS for fast and accurate legal document retrieval. I am implementing end-to-end workflows for ingestion, search, and generation, enabling the tool to provide actionable recommendations and guidance for HR compliance scenarios
Built an AI-driven mock technical interview platform using Google Gemini, LangGraph, and few-shot learning, simulating realistic, multi-turn coding interviews with function calling, structured outputs, and image/code understanding. Enabled personalized feedback, skill assessment, and scalable preparation through agentic workflows and long-context reasoning
Benchmarked NLP and GenAI tools (Hugging Face Transformers, OpenAI GPT-4, Cohere APIs, Weaviate) to automate lesson tagging, document summarization, and semantic search. Delivered Python-based prototypes that reduced manual workload by 70%, enhanced resource discoverability, and enabled cross-team usability in a nonprofit, low-infra environment
Built a hybrid AI system that recommends connections based on shared institutions and social graphs. Used Python, DuckDB, and OpenAI embeddings for structured profile parsing and semantic similarity, while modeling relationships in Neo4j. Orchestrated pipelines with LangChain and deployed an interactive Streamlit UI for real-time recommendations using Qdrant
Crafted an anticipatory maintenance system using Python (scikit-learn, TensorFlow), SQL, and Azure leveraging machine learning to prophesy equipment failures in data centers. This proactive approach curtailed downtime and slashed maintenance expenses significantly. Garnered a striking 90% accuracy in foretelling equipment failures, leading to a notable 30% drop in maintenance costs.
Developed using Mixtral, Whisper, and AWS, integrating language models and tools like GPT-3, BERT, and FFmpeg for efficient video processing. Set up AWS EC2 instances to run large models, reduce latency, and transcribe audio to text with Whisper. Implemented dynamic quiz generation and a feedback system using Flask, HTML, and JavaScript, storing user data in a database. The project highlighted the strengths and limitations of various language models and their practical application in video summarization and interactive quizzes.
Developed an AI-based system to detect financial fraud in real-time, leveraging deep learning models like Transformers, CNNs, and GANs for fraud identification and simulation of forged transactions. Use EfficientNet for image-based fraud detection, while integrating Explainable AI (XAI) techniques to enhance model transparency and ensure ethical decision-making. Visualize fraud pa erns and risks using Tableau or Seaborn, aiding stakeholders in making informed, data-driven investment decisions. Achieved 95% accuracy in detecting fraudulent transactions, reducing false positives..
Built MLOps pipeline for a Loan Eligibility Prediction model using Python, deployed on Google Cloud Platform (GCP). The pipeline involved creating a Flask API, containerizing it with Docker, and managing source code through Cloud Source Repository and Git.Automated deployment was handled via Cloud Build, and the model was deployed using Cloud Run. This project demonstrated efficient cloud architecture, leveraging GCP services for scalable and automated machine learning operations.
Conducted statistical and multivariate analysis on customer data to identify key churn drivers. Developed predictive models using logistic regression and Random Forest, improving churn prediction accuracy by 85%. Provided actionable insights to stakeholders, enabling retention strategies that reduced churn rates by 20%.
Engineered a content recommendation system driven by machine learning to tailor content suggestions for individual users. This bespoke approach fostered a substantial increase in user engagement. Amplified user engagement by a commendable 25% through personalized content recommendations.
Leveraging the robust capabilities of YOLOv5, YOLOv8, DeepSORT, and Easy OCR, engineered an ensemble model tailored for number plate recognition (ANPR) and vehicle tracking, particularly excelling in low light conditions. Achieving an impressive F1 Score of 0.97, this model stands as a testament to its efficacy in challenging environments. Complemented by a user-friendly web interface, it emerges as a versatile and robust solution for ANPR.
Built a predictive model to assess credit risk using logistic regression and decision trees, achieving a risk classification accuracy of 92%. Conducted Time Series Analysis to identify trends in loan defaults, improving risk profiling. Developed dashboards in Power BI to visualize risk metrics and trends, enhancing transparency for stakeholders.
Internships
Data & AI Integration Intern - SubjectToClimate, New York (Remote)
Graduate Data Science Assistant (TAMIDS) - Texas A&M University, College Station
Software Engineer (AI & ML) - Entropik, Chennai, India
Data Science Analyst - High Radius, Chennai, India
Publications and Awards
Mailing address: Unit : 204 , The Villas of Cherry Hollow, 503 Cherry Street, College station, TX 77840
E-mail: raj2001@tamu.edu
or rajpurohitharjun58@gmail.com
Linkdedin: Reach me at Linkedin