Mini-episodes

Mini-episodes

Short explainer episodes with Kyle and Linh Da.

[MINI] type i / type ii errors
[MINI] type i / type ii errors

In this first mini-episode of the Data Skeptic Podcast, we define and discuss type i and type ii errors (a.k.a. false positives and false negatives).

[MINI] p-values
[MINI] p-values

In this mini, we discuss p-values and their use in hypothesis testing, in the context of an hypothetical experiment on plant flowering, and end with a reference to the Particle Fever documentary and how statistical significance played a role.

[MINI] Bayesian Updating
[MINI] Bayesian Updating

In this minisode, we discuss Bayesian Updating - the process by which one can calculate the most likely hypothesis might be true given one's older / prior belief and all new evidence.

[MINI] Experimental Design
[MINI] Experimental Design

This episode loosely explores the topic of Experimental Design including hypothesis testing, the importance of statistical tests, and an everyday and business example.

[MINI] Cross Validation
[MINI] Cross Validation

This miniepisode discusses the technique called Cross Validation - a process by which one randomly divides up a dataset into numerous small partitions. Next, (typically) one is held out, and the rest are used to train some model. The hold out set can then be used to validate how good the model does at describing/predicting new data.

[MINI] Ant Colony Optimization
[MINI] Ant Colony Optimization

In this week's mini episode, Linhda and Kyle discuss Ant Colony Optimization - a numerical / stochastic optimization technique which models its search after the process ants employ in using random walks to find a goal (food) and then leaving a pheremone trail in their walk back to the nest.  We even find some way of relating the city of San Francisco and running a restaurant into the discussion.

[MINI] Noise!!
[MINI] Noise!!

Our topic for this week is "noise" as in signal vs. noise.  This is not a signal processing discussions, but rather a brief introduction to how the work noise is used to describe how much information in a dataset is useless (as opposed to useful).

[MINI] Decision Tree Learning
[MINI] Decision Tree Learning

Linhda and Kyle talk about Decision Tree Learning in this miniepisode.  Decision Tree Learning is the algorithmic process of trying to generate an optimal decision tree to properly classify or forecast some future unlabeled element based by following each step in the tree.

[MINI] Value of Information
[MINI] Value of Information

A discussion about getting ready in the morning, negotiating a used car purchase, and selecting the best AirBnB place to stay at help frame a conversation about the decision theoretic principal known as the Value of Information equation.

[MINI] Confidence Intervals
[MINI] Confidence Intervals

Commute times and BBQ invites help frame a discussion about the statistical concept of confidence intervals.

[MINI] Selection Bias
[MINI] Selection Bias

A discussion about conducting US presidential election polls helps frame a converation about selection bias.

[MINI] The T-Test
[MINI] The T-Test

The t-test is this week's mini-episode topic. The t-test is a statistical testing procedure used to determine if the mean of two datasets differs by a statistically significant amount. We discuss how a wine manufacturer might apply a t-test to determine if the sweetness, acidity, or some other property of two separate grape vines might differ in a statistically meaningful way.

[MINI] Is the Internet Secure?
[MINI] Is the Internet Secure?

This episode explores the basis of why we can trust encryption.  Suprisingly, a discussion of looking up a word in the dictionary (binary search) and efficiently going wine tasting (the travelling salesman problem) help introduce computational complexity as well as the P ?= NP question, which is paramount to the trustworthiness RSA encryption.

[MINI] Monkeys on Typewriters
[MINI] Monkeys on Typewriters

What is randomness? How can we determine if some results are randomly generated or not? Why are random numbers important to us in our everyday life? These topics and more are discussed in this mini-episode on random numbers.

Many readers will be vaguely familar with the idea of "X number of monkeys banging on Y number of typewriters for Z number of years" - the idea being that such a setup would produce random sequences of letters. The origin of this idea was the mathemetician Borel who was interested in whether or not 1,000,000 monkeys working for 10 hours per day might eventually reproduce the works of shakespeare.

We explore this topic and provide some further details in the show notes which you can find over at dataskeptic.com

[MINI] The Girlfriend Equation
[MINI] The Girlfriend Equation

Economist Peter Backus put forward "The Girlfriend Equation" while working on his PhD - a probabilistic model attempting to estimate the likelihood of him finding a girlfriend. In this mini episode we explore the soundness of his model and also share some stories about how Linhda and Kyle met.

[MINI] The Battle of the Sexes
[MINI] The Battle of the Sexes

Love and Data is the continued theme in this mini-episode as we discuss the game theory example of The Battle of the Sexes. In this textbook example, a couple must strategize about how to spend their Friday night. One partner prefers football games while the other partner prefers to attend the opera. Yet, each person would rather be at their non-preferred location so long as they are still with their spouse. So where should they decide to go?

[MINI] Belief in Santa
[MINI] Belief in Santa

In this quick holiday episode, we touch on how one would approach modeling the statistical distribution over the probability of belief in Santa Claus given age.

[MINI] Data Provenance
[MINI] Data Provenance

This episode introduces a high level discussion on the topic of Data Provenance, with more MINI episodes to follow to get into specific topics. Thanks to listener Sara L who wrote in to point out the Data Skeptic Podcast has focused alot about using data to be skeptical, but not necessarily being skeptical of data.

Data Provenance is the concept of knowing the full origin of your dataset. Where did it come from? Who collected it? How as it collected? Does it combine independent sources or one singular source? What are the error bounds on the way it was measured? These are just some of the questions one should ask to understand their data. After all, if the antecedent of an argument is built on dubious grounds, the consequent of the argument is equally dubious.

For a more technical discussion than what we get into in this mini epiosode, I recommend A Survey of Data Provenance Techniques by authors Simmhan, Plale, and Gannon.

[MINI] Partially Observable State Spaces
[MINI] Partially Observable State Spaces

When dealing with dynamic systems that are potentially undergoing constant change, its helpful to describe what "state" they are in.  In many applications the manner in which the state changes from one to another is not completely predictable, thus, there is uncertainty over how it transitions from state to state.  Further, in many applications, one cannot directly observe the true state, and thus we describe such situations as partially observable state spaces.  This episode explores what this means and why it is important in the context of chess, poker, and the mood of Yoshi the lilac crowned amazon parrot.

[MINI] The Chi-Squared Test
[MINI] The Chi-Squared Test

The Chi-Squared test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Vegetarian, Pescetarian, and Omnivore), split into two or more categories (e.g. Male, Female), a question may arise such as "Are women more likely than men to be vegetarian?" or put more accurately, "Is any observed difference in the frequency with which women report being vegetarian differ in a statistically significant way from the frequency men report that?"

[MINI] k-means clustering
[MINI] k-means clustering

The k-means clustering algorithm is an algorithm that computes a deterministic label for a given "k" number of clusters from an n-dimensional datset.  This mini-episode explores how Yoshi, our lilac crowned amazon's biological processes might be a useful way of measuring where she sits when there are no humans around.  Listen to find out how!

[MINI] Ordinary Least Squares Regression
[MINI] Ordinary Least Squares Regression

This episode explores Ordinary Least Squares or OLS - a method for finding a good fit which describes a given dataset.

[MINI] Markov Chains
[MINI] Markov Chains

This episode introduces the idea of a Markov Chain. A Markov Chain has a set of states describing a particular system, and a probability of moving from one state to another along every valid connected state. Markov Chains are memoryless, meaning they don't rely on a long history of previous observations. The current state of a system depends only on the previous state and the results of a random outcome.

Markov Chains are a useful way method for describing non-deterministic systems. They are useful for destribing the state and transition model of a stochastic system.

As examples of Markov Chains, we discuss stop light signals, bowling, and text prediction systems in light of whether or not they can be described with Markov Chains.

[MINI] Markov Chain Monte Carlo
[MINI] Markov Chain Monte Carlo

This episode explores how going wine testing could teach us about using markov chain monte carlo (mcmc).

[MINI] Natural Language Processing
[MINI] Natural Language Processing

This episode overviews some of the fundamental concepts of natural language processing including stemming, n-grams, part of speech tagging, and th bag of words approach.

[MINI] Cornbread and Overdispersion
[MINI] Cornbread and Overdispersion

For our 50th episode we enduldge a bit by cooking Linhda's previously mentioned "healthy" cornbread.  This leads to a discussion of the statistical topic of overdispersion in which the variance of some distribution is larger than what one's underlying model will account for.

[MINI] z-scores
[MINI] z-scores

This week's episode dicusses z-scores, also known as standard score. This score describes the distance (in standard deviations) that an observation is away from the mean of the population. A closely related top is the 68-95-99.7 rule which tells us that (approximately) 68% of a normally distributed population lies within one standard deviation of the mean, 95 within 2, and 99.7 within 3.

Kyle and Linh Da discuss z-scores in the context of human height. If you'd like to calculate your own z-score for height, you can do so below. They further discuss how a z-score can also describe the likelihood that some statistical result is due to chance. Thus, if the significance of a finding can be said to be 3σ, that means that it's 99.7% likely not due to chance, or only 0.3% likely to be due to chance.

[MINI] A Critical Examination of a Study of Marriage by Political Affiliation
[MINI] A Critical Examination of a Study of Marriage by Political Affiliation

Linhda and Kyle review a New York Times article titled How Your Hometown Affects Your Chances of Marriage. This article explores research about what correlates with the likelihood of being married by age 26 by county. Kyle and LinhDa discuss some of the fine points of this research and the process of identifying factors for consideration.

[MINI] Anscombe's Quartet
[MINI] Anscombe's Quartet

This mini-episode discusses Anscombe's Quartet, a series of four datasets which are clearly very different but share some similar statistical properties with one another. For example, each of the four plots has the same mean and variance on both axis, as well as the same correlation coefficient, and same linear regression.

 

The episode tries to add some context by imagining each of these datasets as data about a sports team, and why it can be important to look beyond basic summary statistics when exploring your dataset.

[MINI] The Curse of Dimensionality
[MINI] The Curse of Dimensionality

More features are not always better! With an increasing number of features to consider, machine learning algorithms suffer from the curse of dimensionality, as they have a wider set and often sparser coverage of examples to consider. This episode explores a real life example of this as Kyle and Linhda discuss their thoughts on purchasing a home.

The curse of dimensionality was defined by Richard Bellman, and applies in several slightly nuanced cases. This mini-episode discusses how it applies on machine learning.

This episode does not, however, discuss a slightly different version of the curse of dimensionality which appears in decision theoretic situations. Consider the game of chess. One must think ahead several moves in order to execute a successful strategy. However, thinking ahead another move requires a consideration of every possible move of every piece controlled, and every possible response one's opponent may take. The space of possible future states of the board grows exponentially with the horizon one wants to look ahead to. This is present in the notably useful Bellman equation.

[MINI] MapReduce
[MINI] MapReduce

This mini-episode is a high level explanation of the basic idea behind MapReduce, which is a fundamental concept in big data. The origin of the idea comes from a Google paper titled MapReduce: Simplified Data Processing on Large Clusters. This episode makes an analogy to tabulating paper voting ballets as a means of helping to explain how and why MapReduce is an important concept.

[MINI] k-Nearest Neighbors
[MINI] k-Nearest Neighbors

This episode explores the k-nearest neighbors algorithm which is an unsupervised, non-parametric method that can be used for both classification and regression. The basica concept is that it leverages some distance function on your dataset to find the $k$ closests other observations of the dataset and averaging them to impute an unknown value or unlabelled datapoint.

[MINI] PageRank
[MINI] PageRank

PageRank is the algorithm most famous for being one of the original innovations that made Google stand out as a search engine. It was defined in the classic paper The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Larry Page. While this algorithm clearly impacted web searching, it has also been useful in a variety of other applications. This episode presents a high level description of this algorithm and how it might apply when trying to establish who writes the most influencial academic papers.

[MINI] Structured and Unstructured Data
[MINI] Structured and Unstructured Data

Today's mini-episode explains the distinction between structured and unstructured data, and debates which of these categories best describe recipes.

[MINI] Distance Measures
[MINI] Distance Measures

There are many occasions in which one might want to know the distance or similarity between two things, for which the means of calculating that distance is not necessarily clear. The distance between two points in Euclidean space is generally straightforward, but what about the distance between the top of Mount Everest to the bottom of the ocean? What about the distance between two sentences?

[MINI] Sample Sizes
[MINI] Sample Sizes

There are several factors that are important to selecting an appropriate sample size and dealing with small samples. The most important questions are around representativeness - how well does your sample represent the total population and capture all it's variance?

[MINI] Multi-armed Bandit Problems
[MINI] Multi-armed Bandit Problems

The multi-armed bandit problem is named with reference to slot machines (one armed bandits). Given the chance to play from a pool of slot machines, all with unknown payout frequencies, how can you maximize your reward? If you knew in advance which machine was best, you would play exclusively that machine. Any strategy less than this will, on average, earn less payout, and the difference can be called the "regret".

[MINI] The Central Limit Theorem
[MINI] The Central Limit Theorem

The central limit theorem is an important statistical result which states that typically, the mean of a large enough set of independent trials is approximately normally distributed.  This episode explores how this might be used to determine if an amazon parrot like Yoshi produces or or less waste than an African Grey, under the assumption that the individual distributions are not normal.

[MINI] Covariance and Correlation
[MINI] Covariance and Correlation

The degree to which two variables change together can be calculated in the form of their covariance. This value can be normalized to the correlation coefficient, which has the advantage of transforming it to a unitless measure strictly bounded between -1 and 1. This episode discusses how we arrive at these values and why they are important.

[MINI] Bias Variance Tradeoff
[MINI] Bias Variance Tradeoff

A discussion of the expected number of cars at a stoplight frames today's discussion of the bias variance tradeoff. The central ideal of this concept relates to model complexity. A very simple model will likely generalize well from training to testing data, but will have a very high variance since it's simplicity can prevent it from capturing the relationship between the covariates and the output. As a model grows more and more complex, it may capture more of the underlying data but the risk that it overfits the training data and therefore does not generalize (is biased) increases. The tradeoff between minimizing variance and minimizing bias is an ongoing challenge for data scientists, and an important discussion for skeptics around how much we should trust models.

[MINI] The Accuracy Paradox
[MINI] The Accuracy Paradox

Today's episode discusses the accuracy paradox. There are cases when one might prefer a less accurate model because it yields more predictive power or better captures the underlying causal factors describing the outcome variable you are interested in. This is especially relevant in machine learning when trying to predict rare events. We discuss how the accuracy paradox might apply if you were trying to predict the likelihood a person was a bird owner.

[MINI] Term Frequency - Inverse Document Frequency
[MINI] Term Frequency - Inverse Document Frequency

Today's topic is term frequency inverse document frequency, which is a statistic for estimating the importance of words and phrases in a set of documents.

[MINI] Gradient Descent
[MINI] Gradient Descent

Today's mini episode discusses the widely known optimization algorithm gradient descent in the context of hiking in a foggy hillside.

[MINI] The Bonferroni Correction
[MINI] The Bonferroni Correction

Today's episode begins by asking how many left handed employees we should expect to be at a company before anyone should claim left handedness discrimination. If not lefties, let's consider eye color, hair color, favorite ska band, most recent grocery store used, and any number of characteristics could be studied to look for deviations from the norm in a company.

[MINI] k-d trees
[MINI] k-d trees

This episode reviews the concept of k-d trees: an efficient data structure for holding multidimensional objects. Kyle gives Linhda a dictionary and asks her to look up words as a way of introducing the concept of binary search. We actually spend most of the episode talking about binary search before getting into k-d trees, but this is a necessary prerequisite.

[MINI] Multiple Regression
[MINI] Multiple Regression

This episode is a discussion of multiple regression: the use of observations that are a vector of values to predict a response variable. For this episode, we consider how features of a home such as the number of bedrooms, number of bathrooms, and square footage can predict the sale price.

[MINI] R-squared
[MINI] R-squared

How well does your model explain your data? R-squared is a useful statistic for answering this question. In this episode we explore how it applies to the problem of valuing a house. Aspects like the number of bedrooms go a long way in explaining why different houses have different prices. There's some amount of variance that can be explained by a model, and some amount that cannot be directly measured. R-squared is the ratio of the explained variance to the total variance. It's not a measure of accuracy, it's a measure of the power of one's model.

[MINI] The Elbow Method
[MINI] The Elbow Method

Certain data mining algorithms (including k-means clustering and k-nearest neighbors) require a user defined parameter k. A user of these algorithms is required to select this value, which raises the questions: what is the "best" value of k that one should select to solve their problem?

[MINI] Fractional Factorial Design
[MINI] Fractional Factorial Design

A dinner party at Data Skeptic HQ helps teach the uses of fractional factorial design for studying 2-way interactions.

[MINI] Auto-correlative functions and correlograms
[MINI] Auto-correlative functions and correlograms
When working with time series data, there are a number of important diagnostics one should consider to help understand more about the data. The auto-correlative function, plotted as a correlogram, helps explain how a given observations relates to recent preceding observations. A very random process (like lottery numbers) would show very low values, while temperature (our topic in this episode) does correlate highly with recent days.
 
See the show notes with details about Chapel Hill, NC weather data by visiting:
 
 
[MINI] Bargaining
[MINI] Bargaining

Bargaining is the process of two (or more) parties attempting to agree on the price for a transaction.  Game theoretic approaches attempt to find two strategies from which neither party is motivated to deviate.  These strategies are said to be in equilibrium with one another.  The equilibriums available in bargaining depend on the the transaction mechanism and the information of the parties.  Discounting (how long parties are willing to wait) has a significant effect in this process.  This episode discusses some of the choices Kyle and Linh Da made in deciding what offer to make on a house.

[MINI] Stationarity and Differencing
[MINI] Stationarity and Differencing

Mystery shoppers and fruit cultivation help us discuss stationarity - a property of some time serieses that are invariant to time in several ways. Differencing is one approach that can often convert a non-stationary process into a stationary one. If you have a stationary process, you get the benefits of many known statistical properties that can enable you to do a significant amount of inferencing and prediction.

[MINI] Goodhart's Law
[MINI] Goodhart's Law

Goodhart's law states that "When a measure becomes a target, it ceases to be a good measure". In this mini-episode we discuss how this affects SEO, call centers, and Scrum.

[MINI] The CAP Theorem
[MINI] The CAP Theorem

Distributed computing cannot guarantee consistency, accuracy, and partition tolerance. Most system architects need to think carefully about how they should appropriately balance the needs of their application across these competing objectives. Linh Da and Kyle discuss the CAP Theorem using the analogy of a phone tree for alerting people about a school snow day.

[MINI] Leakage
[MINI] Leakage

If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For those without access to time travel technology, we need to avoid including information about the future in our training data when building machine learning models. Similarly, if any other feature whose value would not actually be available in practice at the time you'd want to use the model to make a prediction, is a feature that can introduce leakage to your model.

[MINI] Receiver Operating Characteristic (ROC) Curve
[MINI] Receiver Operating Characteristic (ROC) Curve

An ROC curve is a plot that compares the trade off of true positives and false positives of a binary classifier under different thresholds. The area under the curve (AUC) is useful in determining how discriminating a model is. Together, ROC and AUC are very useful diagnostics for understanding the power of one's model and how to tune it.

[MINI] Survival Analysis
[MINI] Survival Analysis

Survival analysis techniques are useful for studying the longevity of groups of elements or individuals, taking into account time considerations and right censorship. This episode explores how survival analysis can describe marriages, in particular, using the non-parametric Cox proportional hazard model.

[MINI] ANOVA
[MINI] ANOVA

Analysis of variance is a method used to evaluate differences between the two or more groups.  It works by breaking down the total variance of the system into the between group variance and within group variance.  We discuss this method in the context of wait times getting coffee at Starbucks.

[MINI] Paxos
[MINI] Paxos

Paxos is a protocol for arriving a consensus in a distributed computing system which accounts for unreliability of the nodes.  We discuss how this might be used in the real world in the event of a massive disaster.

[MINI] Heteroskedasticity
[MINI] Heteroskedasticity

Heteroskedasticity is a term used to describe a relationship between two variables which has unequal variance over the range.  For example, the variance in the length of a cat's tail almost certainly changes (grows) with age.  On the other hand, the average amount of chewing gum a person consume probably has a consistent variance over a wide range of human heights.

[MINI] F1 Score
[MINI] F1 Score

The F1 score is a model diagnostic that combines precision and recall to provide a singular evaluation for model comparison.  In this episode we discuss how it applies to selecting an interior designer.

[MINI] Random Forest
[MINI] Random Forest

Random forest is a popular ensemble learning algorithm which leverages bagging both for sampling and feature selection. In this episode we make an analogy to the process of running a bookstore.

[MINI] Calculating Feature Importance
[MINI] Calculating Feature Importance

For machine learning models created with the random forest algorithm, there is no obvious diagnostic to inform you which features are more important in the output of the model. Some straightforward but useful techniques exist revolving around removing a feature and measuring the decrease in accuracy or Gini values in the leaves. We broadly discuss these techniques in this episode.

[MINI] AdaBoost
[MINI] AdaBoost

AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak learners. We discuss how a complex problem like predicting restaurant failure (which is surely caused by different problems in different situations) might benefit from this technique.

[MINI] Gini Coefficients
[MINI] Gini Coefficients

The Gini Coefficient (as it relates to decision trees) is one approach to determining the optimal decision to introduce which splits your dataset as part of a decision tree. To pick the right feature to split on, it considers the frequency of the values of that feature and how well the values correlate with specific outcomes that you are trying to predict.

[MINI] The Bootstrap
[MINI] The Bootstrap

The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce useful metrics on the result. The bootstrap is a useful statistical technique and is leveraged in Bagging (bootstrap aggregation) algorithms such as Random Forest. We discuss this technique related to polling and surveys.

[MINI] Entropy
[MINI] Entropy

Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is more useful to say it's a measure of the unpredictability of the system. In this episode we discuss how information reduces the entropy in deciding whether or not Yoshi the parrot will like a new chew toy. A few other everyday examples help us examine why entropy is a nice metric for constructing a decision tree.

[MINI] Dropout
[MINI] Dropout

Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfitting is to use dropout. Dropout is the method of randomly selecting some neurons in one's network to set to zero during iterations of learning. The core idea is that each particular input in a given layer is not always available and therefore not a signal that can be relied on too heavily.

[MINI] Logistic Regression on Audio Data
[MINI] Logistic Regression on Audio Data

Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode.

[MINI] Primer on Deep Learning
[MINI] Primer on Deep Learning

In this episode, we talk about a high-level description of deep learning.  Kyle presents a simple game (pictured below), which is more of a puzzle really, to try and give  Linh Da the basic concept.

[MINI] Automated Feature Engineering
[MINI] Automated Feature Engineering

If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process observed in deep learning, where each layer of the business reports up different types of observations, KPIs, and reports to be interpreted by the next layer of the business. In deep learning, this process can be thought of as automated feature engineering. DNNs built to recognize objects in images may learn structures that behave like edge detectors in the first hidden layer. Proceeding layers learn to compose more abstract features from lower level outputs. This episode explore that analogy in the context of automated feature engineering.

[MINI] The Perceptron
[MINI] The Perceptron

Today's episode overviews the perceptron algorithm. This rather simple approach is characterized by a few particular features. It updates its weights after seeing every example, rather than as a batch. It uses a step function as an activation function. It's only appropriate for linearly separable data, and it will converge to a solution if the data meets these criteria. Being a fairly simple algorithm, it can run very efficiently. Although we don't discuss it in this episode, multi-layer perceptron networks are what makes this technique most attractive.

[MINI] Feed Forward Neural Networks
[MINI] Feed Forward Neural Networks

Feed Forward Neural Networks

In a feed forward neural network, neurons cannot form a cycle. In this episode, we explore how such a network would be able to represent three common logical operators: OR, AND, and XOR. The XOR operation is the interesting case.

[MINI] Backpropagation
[MINI] Backpropagation

Backpropagation is a common algorithm for training a neural network.  It works by computing the gradient of each weight with respect to the overall error, and using stochastic gradient descent to iteratively fine tune the weights of the network.  In this episode, we compare this concept to finding a location on a map, marble maze games, and golf.

[MINI] GPU CPU
[MINI] GPU CPU

There's more than one type of computer processor. The central processing unit (CPU) is typically what one means when they say "processor". GPUs were introduced to be highly optimized for doing floating point computations in parallel. These types of operations were very useful for high end video games, but as it turns out, those same processors are extremely useful for machine learning. In this mini-episode we discuss why.

[MINI] Generative Adversarial Networks
[MINI] Generative Adversarial Networks

GANs are an unsupervised learning method involving two neural networks iteratively competing. The discriminator is a typical learning system. It attempts to develop the ability to recognize members of a certain class, such as all photos which have birds in them. The generator attempts to create false examples which the discriminator incorrectly classifies. In successive training rounds, the networks examine each and play a mini-max game of trying to harm the performance of the other.

[MINI] Convolutional Neural Networks
[MINI] Convolutional Neural Networks

CNNs are characterized by their use of a group of neurons typically referred to as a filter or kernel.  In image recognition, this kernel is repeated over the entire image.  In this way, CNNs may achieve the property of translational invariance - once trained to recognize certain things, changing the position of that thing in an image should not disrupt the CNN's ability to recognize it.  In this episode, we discuss a few high-level details of this important architecture.

[MINI] Max-pooling
[MINI] Max-pooling

Max-pooling is a procedure in a neural network which has several benefits. It performs dimensionality reduction by taking a collection of neurons and reducing them to a single value for future layers to receive as input. It can also prevent overfitting, since it takes a large set of inputs and admits only one value, making it harder to memorize the input. In this episode, we discuss the intuitive interpretation of max-pooling and why it's more common than mean-pooling or (theoretically) quartile-pooling.

[MINI] Activation Functions
[MINI] Activation Functions

In a neural network, the output value of a neuron is almost always transformed in some way using a function. A trivial choice would be a linear transformation which can only scale the data. However, other transformations, like a step function allow for non-linear properties to be introduced.

[MINI] The Vanishing Gradient
[MINI] The Vanishing Gradient

This episode discusses the vanishing gradient - a problem that arises when training deep neural networks in which nearly all the gradients are very close to zero by the time back-propagation has reached the first hidden layer. This makes learning virtually impossible without some clever trick or improved methodology to help earlier layers begin to learn.

[MINI] Conditional Independence
[MINI] Conditional Independence

In statistics, two random variables might depend on one another (for example, interest rates and new home purchases). We call this conditional dependence. An important related concept exists called conditional independence. This phrase describes situations in which two variables are independent of one another given some other variable.

[MINI] Bayesian Belief Networks
[MINI] Bayesian Belief Networks

A Bayesian Belief Network is an acyclic directed graph composed of nodes that represent random variables and edges that imply a conditional dependence between them. It's an intuitive way of encoding your statistical knowledge about a system and is efficient to propagate belief updates throughout the network when new information is added.

[MINI] Recurrent Neural Networks
[MINI] Recurrent Neural Networks

RNNs are a class of deep learning models designed to capture sequential behavior.  An RNN trains a set of weights which depend not just on new input but also on the previous state of the neural network.  This directed cycle allows the training phase to find solutions which rely on the state at a previous time, thus giving the network a form of memory.  RNNs have been used effectively in language analysis, translation, speech recognition, and many other tasks.

[MINI] One Shot Learning
[MINI] One Shot Learning

One Shot Learning is the class of machine learning procedures that focuses learning something from a small number of examples.  This is in contrast to "traditional" machine learning which typically requires a very large training set to build a reasonable model.

In this episode, Kyle presents a coded message to Linhda who is able to recognize that many of these new symbols created are likely to be the same symbol, despite having extremely few examples of each.  Why can the human brain recognize a new symbol with relative ease while most machine learning algorithms require large training data?  We discuss some of the reasons why and approaches to One Shot Learning.

[MINI] Big Oh Analysis
[MINI] Big Oh Analysis

How long an algorithm takes to run depends on many factors including implementation details and hardware.  However, the formal analysis of algorithms focuses on how they will perform in the worst case as the input size grows.  We refer to an algorithm's runtime as it's "O" which is a function of its input size "n".  For example, O(n) represents a linear algorithm - one that takes roughly twice as long to run if you double the input size.  In this episode, we discuss a few everyday examples of algorithmic analysis including sorting, search a shuffled deck of cards, and verifying if a grocery list was successfully completed.

Thanks to our sponsor Brilliant.org, who right now is featuring a related problem as their Brilliant Problem of the Week.

[MINI] Turing Machines
[MINI] Turing Machines

TMs are a model of computation at the heart of algorithmic analysis.  A Turing Machine has two components.  An infinitely long piece of tape (memory) with re-writable squares and a read/write head which is programmed to change it's state as it processes the input.  This exceptionally simple mechanical computer can compute anything that is intuitively computable, thus says the Church-Turing Thesis.

Attempts to make a "better" Turing Machine by adding things like additional tapes can make the programs easier to describe, but it can't make the "better" machine more capable.  It won't be able to solve any problems the basic Turing Machine can, even if it perhaps solves them faster.

An important concept we didn't get to in this episode is that of a Universal Turing Machine.  Without the prefix, a TM is a particular algorithm.  A Universal TM is a machine that takes, as input, a description of a TM and an input to that machine, and subsequently, simulates the inputted machine running on the given input.

Turing Machines are a central idea in computer science.  They are central to algorithmic analysis and the theory of computation.

[MINI] Sudoku \in NP
[MINI] Sudoku \in NP

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size.

[MINI] Exponential Time Algorithms
[MINI] Exponential Time Algorithms

In this episode we discuss the complexity class of EXP-Time which contains algorithms which require $O(2^{p(n)})$ time to run.  In other words, the worst case runtime is exponential in some polynomial of the input size.  Problems in this class are even more difficult than problems in NP since you can't even verify a solution in polynomial time.

We mostly discuss Generalized Chess as an intuitive example of a problem in EXP-Time.  Another well-known problem is determining if a given algorithm will halt in k steps.  That extra condition of restricting it to k steps makes this problem distinct from Turing's original definition of the halting problem which is known to be intractable.

[MINI] Parallel Algorithms
[MINI] Parallel Algorithms

When computers became commodity hardware and storage became incredibly cheap, we entered the era of so-call "big" data. Most definitions of big data will include something about not being able to process all the data on a single machine. Distributed computing is required for such large datasets.