Skip to main content

Datasets

On this page you can find a collection of the datasets I’ve created, scraped, or otherwise collected.

Beer Bottle Labels
Hosted on github, this is a collection of scanned beer bottle labels. This dataset is intended to help develop beer label artwork by analyzing typical label sizes, smallest font sizes, information contained, etc.
South Park Episode Transcripts
Hosted on Github, this dataset is the episode transcript data for the TV Show South Park. The dataset includes season one through 19 and breaks text data into seasons and character csv files. Data was scraped from the South Park fandom pages with a python script