Data Collection

July 15, 2017
Backing Up a Podcast

by Kyle Polich

Everything on the internet is there forever. Right? It depends on what you mean. If something makes it into the Internet Archive, I feel pretty confident it will be around for the duration of my lifetime. Yet, I'm from a time right on the cusp on when permenance became a possibility. I can name many bands that were just a few years to early to appear on Youtube and get an album into iTunes. Those works are much more at risk of bit... View More >

May 26, 2017

Machine learning has been an essential tool for solving computer vision tasks such as image classification, object detection, instance recognition, and semantic segmentation, among others. The crux of machine learning approaches involves data. Training a machine requires enormous amounts of usable data. Why? Suppose you want to learn about monkeys and apes. Let's also assume you've never seen any monkeys or apes in your lifetime, until one day, someone shows you a picture of a monkey and an ape. It might be difficult to generalize from one picture and discern the differences between a monkey and an ape. If you saw perhaps 50 pictures of each species, you would have a greater chance of noticing that monkeys tend to be smaller than apes and that monkeys tend to have tails, whereas apes do not. Now if you saw thousands of pictures of both monkeys and apes, it might become very clear to you that the two are in fact, very different. For example, you might discover monkeys and apes have different nose structures, upper bodies, feet and so... View More >

February 11, 2017

When it comes to feeding your family and taking care of your health, you want to have the most accurate nutrition information at your disposal. Unfortunately, with the rise of social media and numerous "expert" blogs, finding trustworthy data is diffic... View More >

January 24, 2017
Chess Rating Analysis

by Kyle Polich

After our recent episode with Peter Backus, I wanted to learn more about Elo ratings in chess. The results of his research showed an effect size of equivalent to a different of 30 Elo points. How significant is that? To answer the question, I needed to get some data to explore. I found chessgames.com to be a good resource and below is how I crawled the d... View More >

November 14, 2016
Regulations.Gov

by Kyle Polich

In an effort to take citizen's opinions and perspectives into account, the federal government has launched [regulations.gov](https://www.regulations.gov/). This site allows visitors to review documents related to open propositions and comment online. The site acts as a repository for those comments, and it provides an admirable API for accessing this data. Below are some notes on how to interact with the API effectively in Pyt... View More >