Portfolio 2026

Mahi Sharma

B.Tech Computer Science (Data Science) — Manipal University Jaipur (2023–2027)

Focused on Data Science, Machine Learning & Data Quality Engineering

Professional Experience

Data Analyst Intern Cognine Technologies Pvt. Ltd.

Dec 2025 – Jan 2026 • On-site

  • Profiled operational datasets across multiple data sources, identifying null values, schema mismatches, and formatting inconsistencies impacting downstream reporting
  • Developed 21 Python-based validation rules using Pandas and NumPy to automate data quality checks on reporting datasets
  • Monitored data quality metrics across pipelines and flagged anomalies affecting business reports
  • Collaborated with cross-functional stakeholders to document the business impact of data quality issues and standardize data handling processes
Web Content Writer Intern Miso

Jul 2024 – Oct 2024 • Remote

  • Created 20+ SEO-optimized content pieces for product pages and marketing collateral
  • Delivered content consistently within deadlines while collaborating with a remote team

Patent

Online Simulator for Automatic Chess Board

Patent Application Published

2024 – Present

Built a rule-based chess simulation engine implementing 17+ move and game rules to validate moves and control gameplay logic in an automated chessboard system.

Featured Projects

Data Analysis Feb – Mar 2026

Impulse Purchase Prediction

Analyzed 250K e-commerce transactions to identify impulse buying patterns using a Decision Tree Classifier achieving 89.2% accuracy. Engineered 10+ features and built interactive dashboards to communicate customer spending behaviour across 4 buyer segments.

89.2% accuracy • 250K transactions
Rggplot2 Plotly
View on GitHub
Data Quality Jun 2025

Data Drift Monitoring System

Statistical monitoring system using the Kolmogorov-Smirnov test to detect distribution shifts across 11 features on a 150K-record credit risk dataset. Includes an interactive Streamlit dashboard with KDE visualizations and a composite data reliability scoring engine.

Reliability Score: 55/100 on injected drift
PythonScikit-Learn PandasNumPy PlotlyStreamlit
View on GitHub
Machine Learning Jul 2025

Predicting Spotify Song Popularity

EDA and regression modeling on 2.1M+ Spotify tracks to predict popularity from 14 audio features. Compared Linear Regression, Decision Tree, and Random Forest, then optimized via RandomizedSearchCV hyperparameter tuning.

R² = 0.655 • Final MSE: 85.34
PythonScikit-Learn PandasMatplotlib
View on GitHub
Machine Learning Apr 2024

Online Payments Fraud Detection

XGBoost classifier trained on 6.36M+ transactions to detect fraudulent payments. Addressed extreme class imbalance (99.87% non-fraud vs 0.13% fraud) using SMOTE, creating a balanced training set of 5M+ samples per class.

6.36M transactions • SMOTE-balanced
PythonXGBoost Scikit-LearnSMOTE PandasNumPy Matplotlib
View on GitHub
SQL Analytics Mar 2026

Banking Transactions Analysis — SQL

Pure SQL fraud pattern detection on 1.2M banking transactions using window functions, rolling 7-day averages, and CASE-based anomaly flagging. Surfaces HIGH RISK transactions missed by binary fraud labels, complementing ML-based detection at the query layer.

1.2M transactions • Rule-based anomaly detection
SQLSQLite Window FunctionsCTEs Rolling Averages
View on GitHub
Data Analysis 2024

Healthcare Analysis

Comprehensive EDA on 55,500 patient admissions (2019–2024) across 6 medical conditions. Built a Random Forest model for length-of-stay prediction, identifying Billing Amount (0.47) and Age (0.22) as the strongest predictive features.

55,500 records • Top feature importance: 0.47
PythonPandas NumPyScikit-Learn MatplotlibSeaborn
View on GitHub
Machine Learning 2024

Diabetes Prediction

Classification on the PIMA diabetes dataset using 8 clinical features (Glucose, BMI, Age, etc.). Compared Logistic Regression, SVM, and KNN with Grid Search tuning. Preprocessing included median imputation for invalid zeros and StandardScaler normalization.

Best: Tuned SVM — 75% accuracy
PythonScikit-Learn PandasNumPy Matplotlib
View on GitHub

Certifications

Data Analytics Job Simulation

Deloitte

Jan 2026

Programming in Java — Elite Grade

NPTEL

Oct 2024

Crash Course on Python — 95%

Google • Coursera

Aug 2024

Skills & Technologies

Languages

PythonSQL JavaC R

Libraries & Frameworks

PandasNumPy Scikit-LearnXGBoost MatplotlibSeaborn Streamlit

Databases

MySQLSQLite

Tools

Microsoft ExcelGit GitHubVS Code Jupyter Notebook

Visualization

Power BIggplot2 Plotly

Core Competencies

Data Cleaning & Preprocessing Exploratory Data Analysis Feature Engineering Statistical Analysis Data Validation & Quality Monitoring