Personal Projects

Have a look at my projects below to get an idea of what I’ve worked on, how I implement my skills and my approach from concept to execution. Reach out if you’d like to learn more about a specific project.

  • Yahoo Finance Web Scraper

    I built a Yahoo Finance web scraper in Python that uses multithreading to quickly scrape stock price history, dividend payments, statistics, and financial information from the Yahoo Finance website. This project also contains code to scrape the earnings and dividend calendars from Nasdaq.

    Libraries: Requests, Pandas,  Datetime, Time, Numpy, LXML, BS4, JSON, OS, SYS, Concurrent.Futures, Math

  • SQL Data Analysis Project

    I analyzed a large building permit data set and populations for each zipcode from the 2010 Census. I created two SQL tables: Permits and Population.  I ran a number of SQL queries. I also transferred some information from SQL to Python to calculate Z-scores and conduct a Chi-Square test on the data which are functions that SQLite does not support.

    Libraries: Pandas, Datetime, Numpy, SciPy, SQLAlchemy

  • Kickstarter Data Project

    I performed data cleaning and analysis on Kickstarter data.  I dropped duplicates, fixed inconsistent data entry, created new features, and encoded dummy variables. I also ran statistical tests including Chi-Square, F-test for joint significance, the Kruskal-Wallis H test, and the Dunn's test. 

    The most exciting part of this project is the multithreading implemented to optimize parameters. It is over 10 times faster to use multithreading versus GridSearch or RandomizedSearch when conducting logistic regression or using random forest or extreme gradient boosting.

    Libraries: Pandas, Numpy, SciPy, SciKit-Learn, Matplotlib, scikit_posthocs, XGBoost, Concurrent.Futures

  • Tax Data Project

    I performed statistical analysis to analyze the effect of the urban population share on tax rates as a percent of GDP.  I corrected data entry issues and encoded dummy variables. Moreover, I ran the  Hausman test to compare fixed effects versus random effects. 

    Libraries: Pandas, Numpy, SciPy, Matplotlib, statsmodels, linearmodels

  • Stata Data Cleaning

    I programmatically downloaded data files from IPEDS using Python, a task that very few coders would be able to achieve. Then, I harmonized the data sets into a consistent panel dataset that is easy for others to use. I reshaped each data set to be long by unitid, academic rank, contract length, and sex. This required converting triply wide data into long for some years. 

  • NetworkX Graphs

    Using the Energy Information Administration API, I created a social network graph in Python of oil-producing countries based on high correlations between production decisions. I used advanced code to customize label placement and margins.

    Libraries: Numpy, Pandas, EIA, NetworkX, Matplotlib

  • Parse PDF Documents

    One of the most complex tasks in computer programming is parsing tables from PDF documents. I used Tabula to scrape an imperfect table from a PDF document. I cleaned the messy output for the desired information and exported it to Excel.

     

    Libraries: Tabula, Pandas, Itertools

  • Option Greeks Visualizations

    I created two visualizations to depict the relationship between:

     

    1. Vega and ask price

    2. Implied Volatility and ask price

    I used advanced methods to label each point by days left until expiration.

    Libraries: Pandas, Matplotlib, Datetime, Time

  • Scrape Google Search Results

    Scrapes Google News RSS feed based on a user-provided search term. Generates a text file with the results. I used this program to automate a list of Twitter posts for bulk upload.

    Libraries: Newspaper, BS4, URLlib

  • Merge Docx and PDF Files

    Wrote programs to quickly combine Microsoft Word and Adobe PDF documents into one easy-to-manage file. These programs saved me time when organizing files for work and school projects. At times, I merged up to 100 files at once. 

    Libraries: Docx, PyPDF2, OS

  • Organize Files in Powershell

    Wrote PowerShell code to move items, rename files, and delete old GitHub forks (mostly from Python tutorials).

    These programs saved me a lot of time.  I was able to quickly sort through files in my downloads folder and rename PowerPoint lecture slides to a consistent format.

©2021 by Debra Ray