I enjoy making things. Here are a selection of projects that I have worked on over the years.
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures.
PyTorch is a Python package that provides tensor computation (like NumPy) with strong GPU acceleration.
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.
This post walks through how I built StartupLeads through market research, customer interviews, MVP scoping & PRD, technical architecture, pricing tests, building growth loops and early results.
This post walks through how I built BoostMyRank through market research, 10 customer interviews, pricing validation at $249/mo, competitive analysis, MVP scoping & PRD, technical architecture, and a one-week build to launch an AI-assisted backlink service supported by an affiliate program.
Build and deploy ML models to predict Lending Club loan defaults and optimize a high-IRR portfolio, with interactive EDA, Flask API, and Dash apps.
Build stacked models on the Ames Housing dataset to predict SalePrice with rigorous cleaning, feature engineering, RFE, and stacking.
Scrape 22k WSJ articles, run VADER/TextBlob sentiment analysis, and test relationships with reader comments and S\&P 500 returns. Includes an R Shiny app.
An interactive Shiny app to explore NYT COVID-19 case and death data by state, county, and date using bar charts, scatter plots, lollipop charts, and maps.
We grouped 8,950 credit‑card customers into clear, actionable personas based on real spending and payment behavior. The write‑up explains the data we used, how we formed the groups at a high level, what makes each persona distinct (e.g., cash‑advance heavy vs. everyday spenders), and how teams can activate them to tailor offers, credit limits, and messaging.
Unsupervised topic modeling on 1.1M ABC News headlines with LDA, LSA, LSI, and HDP; compare scikit‑learn vs Gensim/NLTK preprocessing and visualize separability with t‑SNE.
Built a lightweight, sub-0.5s/image pipeline to read 16-digit card numbers and cardholder names from card photos using OpenCV (template matching) and Tesseract (OCR). On a small, hand-labeled 23-image set, the baseline achieves 48% recall on PAN and 65% on name. With feasibility of low-latency data capture (fraud/risk, checkout autofill) established, next steps include dataset scaling and custom OCR training to reach production readiness.
A practical, side-by-side walkthrough of two ways to stylize images: a custom VGG-19 approach you can fine-tune for unique brand looks, and a pre-trained TensorFlow Hub model that ships fast and scales easily. Includes links, code snippets, and the trade-offs product teams care about (control vs. speed, quality vs. effort).
Built and benchmarked a smartphone-first road-damage detector using YOLOv5 (with ensembling + test-time augmentation) and Faster R-CNN on the Global Road Damage Detection dataset. Achieved a top-5 leaderboard result (F1 0.68) across 121 teams while meeting a 0.5s/image inference target—enabling practical, low-cost deployment from dashboard-mounted phones. Includes a mapping concept (GPS → segment scores) to guide maintenance prioritization for DOTs and municipalities.
Led development of an intelligent video monitoring system that automatically detects moving objects in security footage. Successfully evaluated 8 different detection algorithms across 53 test videos, achieving 96% accuracy in ideal conditions and 82% in challenging lighting scenarios. This solution enables automated security monitoring, parking occupancy tracking, and retail analytics without requiring constant human oversight.