AIML Engineer | Data Scientist

Hi, I am Raj

I build reliable AI systems from experimentation to production impact.

Machine Learning Engineer with 3+ years of experience across NLP, ranking systems, anomaly detection, MLOps, and applied statistics. Built production models on datasets of 2.3M+ records, improved CTR by 22%, and delivered systems contributing to measurable business outcomes.

Location: Bryan, TX (Open to Relocation) | Email: rajpurohitharjun58@gmail.com

GitHub LinkedIn Email

Experience

AI Engineer Analyst — JP Morgan Chase & Co.

Jan 2026 – Present

I work on trade-surveillance AI for equity markets, where millions of market events must be monitored in near real time to detect suspicious behavior.

My role focuses on building explainable, production-grade detection systems that improve recall while satisfying regulatory model-risk standards.

Built graph-based anomaly detection workflows on ~3.2M daily trade relationships, engineering structural and temporal features that improved recall on regulatory-flagged scenarios by 34%.
Developed sub-second streaming detection modules on order-book feeds (~200K events/sec), combining event aggregation and low-latency model scoring for faster monitoring of abnormal trading behavior.
Implemented explainability pipelines using SHAP and attention-based attribution to generate audit-ready justifications and support SR 11-7 model-risk review processes.
Partnered with quant research and compliance to incorporate alternative signals (news sentiment and options flow) into feature-store pipelines, improving event-driven manipulation detection by 27% in backtesting.

Applied Machine Learning Engineer — Zeda AI (Remote)

Jul 2025 – Dec 2025

I built AI systems for clinical-trial matching and performance marketing, where ranking quality and conversion efficiency directly impacted customer outcomes.

I owned model development, experimentation, and monitoring workflows across multiple production deployments with measurable business targets.

Built an end-to-end patient-trial ranking pipeline using Sentence-BERT embeddings and LightGBM, improving matching precision by 50% across 10+ deployments.
Designed and analyzed A/B experiments for ranking and ad-generation strategies, using funnel metrics and statistical testing to double qualified conversions while reducing campaign spend by 30%.
Created monitoring dashboards for CTR, CPL, fairness, drift, and funnel performance, enabling rapid diagnosis of model and campaign changes for engineering and business teams.
Supported KPI tracking and client-facing reporting across 6+ implementations, contributing to roughly $500K in revenue through data-driven optimization.

Data & AI Integration Engineer — SubjectToClimate.org

May 2025 – Jul 2025

I developed a scalable data and AI integration ecosystem to streamline how climate-education resources were ingested, validated, and published.

My role combined NLP modeling, API engineering, ETL automation, and experimentation to improve both operational efficiency and user outcomes.

Architected an NLP-driven automated classification pipeline using Python and LLM-based models, reducing manual curation effort by 70% across 1,000+ resources through automated tagging.
Engineered backend APIs and end-to-end ETL workflows with FastAPI and PostgreSQL, supporting the processing and publication of 100+ weekly opportunities with automated schema enforcement.
Implemented data validation checkpoints and reliability controls across ingestion pipelines, improving data consistency and reducing downstream publishing errors.
Used behavioral analytics and A/B experimentation to refine content strategies, contributing to a measured 25% increase in user satisfaction metrics.

Machine Learning Engineer — IBM, India

Jun 2022 – Jul 2024

I developed NLP and anomaly-detection systems for enterprise IT operations, where prediction quality directly affected routing speed, SLA risk, and support efficiency.

I focused on scalable modeling, feature engineering, and reliable MLOps practices for high-volume production workflows.

Built a multi-label ticket-classification system on 2.3M ServiceNow records using fine-tuned BERT, achieving 91.4% micro-F1 and reducing MTTA by 68% (~14,000 hours saved annually).
Developed escalation-risk predictors with XGBoost and engineered entity, text, log, and temporal features, reaching 87% accuracy for proactive incident intervention.
Implemented streaming anomaly detection with Isolation Forest for SLA-risk identification, achieving 94% recall across real-time operational workflows.
Designed validation and monitoring workflows for production models, reducing model-related incidents by 43% through tighter quality controls and performance tracking.

Research Experience

Research Assistant — Texas A&M AgriLife

I engineered a physics-informed machine learning framework to automate gap-filling of greenhouse gas (CO₂) flux data for climate research workflows.

This work is in publication stage and focuses on producing auditable, high-quality imputed signals usable by global environmental researchers.

Engineered 14+ high-dimensional feature groups, including multi-scale lags, STL decomposition, and physics-based transforms (Q10 temperature response and moisture sigmoid), improving non-linear signal detection by 25% versus raw-data baselines.
Architected a multi-regime cascade pipeline with LightGBM and Random Forest that adapts feature selection to information decay, sustaining predictive quality across missing gaps from hours to 70+ days.
Built a robust imputation validation framework with ablation studies and missingness indicators so models explicitly distinguish observed vs. imputed states for auditability.
Delivered a publication-grade ML workflow that automates CO₂ flux reconstruction and supports reproducible downstream climate-analysis pipelines.

Undergraduate Researcher — Computer Vision & Deep Learning, SRM University

I architected a high-velocity computer vision pipeline to automate vehicle tracking and license plate recognition in real time under challenging conditions.

I focused on model-ensemble design, inference optimization, and robust feature engineering to achieve production-ready throughput and accuracy.

Engineered a multi-stage ensemble pipeline using YOLOv8/YOLOv5, DeepSORT, and CRNN, achieving 0.97 F1-score and 0.99 recall for automated vehicle identification and tracking.
Reduced inference latency by ~30% by converting PyTorch models to ONNX, enabling real-time processing at 20–25 FPS on edge-ready deployment setups.
Developed weighted-averaging logic and advanced augmentation strategies to preserve high detection reliability in low-light, high-speed, and partially obstructed scenarios.
Published the research work at Springer; publication link: View publication.

Projects

Selected AI, ML, and analytics projects focused on measurable outcomes, scalable architectures, and production reliability.

Skills

Programming & Data

PythonSQLC++BashPySparkPandasNumPyPostgreSQLSnowflake

ML & GenAI

Scikit-learnPyTorchTensorFlowTransformersBERTLightGBMXGBoostRAGLangChain

MLOps & Systems

DockerKubernetesMLflowKubeflowKafkaCI/CDA/B TestingModel Monitoring

Cloud & Visualization

AWSGCPSageMakerVertex AIBigQueryMatplotlibSeabornPower BI

Education

M.S. in Data Science — Texas A&M University

College Station, TX • GPA: 3.55/4.0

B.Tech in Computer Science & Engineering (AI) — SRM Institute of Science and Technology

India • GPA: 3.92/4.0

Certifications

Microsoft Certified: Azure Data Scientist Associate
HackerRank: Advanced SQL
IBM: Python for Data Science, AI & Development
NPTEL: Introduction to ML, NLP

Leadership

Led cross-functional ML teams to ship enterprise AI platforms across healthcare and adtech use cases.
Served as Judge/Mentor for hackathons and student research programs at Texas A&M.
Recognized with Best Paper Award for YOLOv8-based vehicle detection and tracking research.

Contact

Open to AI/ML Engineer, Data Scientist, MLE, and Data Engineering roles (Entry to Mid-level).

Email: rajpurohitharjun58@gmail.com
Phone: 979-326-5513
Location: Bryan, Texas, USA