Have a look at my projects below to get an idea of what I’ve worked on, how I implement my skills and my approach from concept to execution. Reach out if you’d like to learn more about a specific project.
Yahoo Finance Web Scraper
I built a Yahoo Finance web scraper in Python that uses multithreading to quickly scrape stock price history, dividend payments, statistics, and financial information from the Yahoo Finance website. It also cleans and outputs the data in a pandas data frame. This project also contains code to scrape the earnings and dividend calendars from Nasdaq. The code is organized into classes containing public as well as private functions.
Libraries: Requests, Pandas, Datetime, Time, Numpy, LXML, BS4, JSON, OS, SYS, Concurrent.Futures, Math
Real Estate Data Project
I created a reusable program that cleans Airbnb and Zillow data with a connection to a PostgreSQL database. It also contains reusable functions in classes to perform exploratory analysis, calculate CAP rates, and recommend optimal zipcodes for investment.
Libraries: Pandas, Numpy, psycopg2, SQLAlchemy, Matplotlib
Python Customer Segmentation
I used K-Means to segment customers in Python based on the number of transactions, the number of items purchased, and the total amount spent.
Libraries: Pandas, Numpy, Sklearn, Matplotlib
Operations Data Project
I analyzed a data set of clothing entering a warehouse using advanced SQL functions and Python. I calculated weighted averages in SQL using window functions and the filter clause to identify clothing that took longer to process in the warehouse. The SQL queries also use subqueries, common table expressions, CASE statements, UNION ALL, and ALTER TABLE ... UPDATE.
Libraries: Pandas, Numpy, Seaborn, Matplotlib, SQLAlchemy
Tax Data Project
I performed statistical analysis to analyze the effect of the urban population share on tax rates as a percent of GDP. I corrected data entry issues and encoded dummy variables. Moreover, I ran the Hausman test to compare fixed effects versus random effects.
Libraries: Pandas, Numpy, SciPy, Matplotlib, statsmodels, linearmodels
Stata Data Cleaning
I programmatically downloaded data files from IPEDS using Python, a task that very few coders would be able to achieve. Then, I harmonized the data sets into a consistent panel dataset that is easy for others to use. I reshaped each data set to be long by unitid, academic rank, contract length, and sex. This required converting triply wide data into long for some years. I also encoded and recoded categorical variables for consistency across years.
Loan Data Project
I created a SQL database containing one table. Then, I aggregated the current balance by institution lender type, loan to value cohorts, and loan age cohorts, requiring common table expressions, UNION ALL, CASE statements, and window functions.
Libraries: Pandas, SQLAlchemy, OS, Matplotlib
Business Data Project
I created three tables in a SQL database. I used UPDATE..SET...WHERE, date functions, joins, LIKE, CASE statements, and subqueries. I wrote a number of queries to determine:
1. How many unique customers had transactions in the year 2021
2. The last names of the customers who purchased a particular product in February 2021
3. The total orders made for a product type, how many orders with a `transact_type` of `SALE` were to customers in the state of New Jersey
4. How many unique customers have had successful, non-returned transactions in 2021
5. A list of all the customers with a `VOID` order for any Product ID/SKU that starts with the letter `t`.
I also wrote a short script to access the City Bikes API in Python.
Libraries: Pandas, SQLAlchemy, OS, Requests
Kickstarter Data Project
I performed data cleaning and analysis on Kickstarter data. I dropped duplicates, fixed inconsistent data entry, created new features, and encoded dummy variables. I also ran statistical tests including Chi-Square, F-test for joint significance, the Kruskal-Wallis H test, and the Dunn's test.
Libraries: Pandas, Numpy, SciPy, SciKit-Learn, Matplotlib, scikit_posthocs
Construction Permit Project
I analyzed a large building permit data set and populations for each zipcode from the 2010 Census. I created two SQL tables: Permits and Population. I ran a number of SQL queries using window functions, subqueries, common table expressions, and joins. I also transferred some information from SQL to Python to calculate Z-scores and conduct a Chi-Square test on the data which are functions that SQLite does not support.
Libraries: Pandas, Datetime, Numpy, SciPy, SQLAlchemy
Using the Energy Information Administration API, I created a social network graph in Python of oil-producing countries based on high correlations between production decisions. I used advanced code to customize label placement and margins.
Libraries: Numpy, Pandas, EIA, NetworkX, Matplotlib
Parse PDF Documents
I used Tabula to scrape an imperfect table from a PDF document. I cleaned the messy output for the desired information and exported it to Excel.
Libraries: Tabula, Pandas, Itertools
Option Greeks Visualizations
I created two visualizations to depict the relationship between:
1. Vega and ask price
2. Implied Volatility and ask price
I used advanced methods to label each point by days left until expiration.
Libraries: Pandas, Matplotlib, Datetime, Time
Scrape Google Search Results
Scrapes Google News RSS feed based on a user-provided search term. Generates a text file with the results. I used this program to automate a list of Twitter posts for bulk upload.
Libraries: Newspaper, BS4, URLlib
Merge Docx and PDF Files
Wrote programs to quickly combine Microsoft Word and Adobe PDF documents into one easy-to-manage file. These programs saved me time when organizing files for work and school projects. At times, I merged up to 100 files at once.
Libraries: Docx, PyPDF2, OS
Organize Files in Powershell
Wrote PowerShell code to move items, rename files, and delete old GitHub forks (mostly from Python tutorials).
These programs saved me a lot of time. I was able to quickly sort through files in my downloads folder and rename PowerPoint lecture slides to a consistent format.