This episode kicks off the next theme on Data Skeptic: artificial intelligence. Kyle discusses what\'s to come for the show in 2018, why this topic is relevant, and how we intend to cover it. ... [more]

This week, our host Kyle Polich is joined by guest Tim Henderson from Google to talk about the computational complexity foundations of modern cryptography and the complexity issues that underlie the field. A key question that arises during the discussion is whether we should trust the security of modern cryptography. ... [more]

We break format from our regular programming today and bring you an excerpt from Max Tegmark\'s book "Life 3.0". The first chapter is a short story titled "The Tale of the Omega Team". Audio excerpted courtesy of Penguin Random House Audio from LIFE 3.0 by Max Tegmark, narrated by Rob Shapiro. You can find "Life 3.0" at your favorite bookstore and the audio edition via penguinrandomhouseaudio.com. Kyle will be giving a talk at the Monterey County SkeptiCamp 2018. ... [more]

I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were two-fold. First, the possibility of optimized integrations with existing Azure services. This would be a big benefit to heavy Azure users who also want to use Spark. Second, the benefits of active directory to control Databricks access for large enterprise. Hear Ali and JG\'s thoughts and comments on what makes Azure Databricks a novel offering. ... [more]

This episode features an interview with Rigel Smiroldo recorded at NIPS 2017 in Long Beach California. We discuss data privacy, machine learning use cases, model deployment, and end-to-end machine learning. ... [more]

When computers became commodity hardware and storage became incredibly cheap, we entered the era of so-call "big" data. Most definitions of big data will include something about not being able to process all the data on a single machine. Distributed computing is required for such large datasets. Getting an algorithm to run on data spread out over a variety of different machines introduced new challenges for designing large-scale systems. First, there are concerns about the best strategy for spreading that data over many machines in an orderly fashion. Resolving ambiguity or disagreements across sources is sometimes required. This episode discusses how such algorithms related to the complexity class NC. ... [more]

In this week\'s episode, Scott Aaronson, a professor at the University of Texas at Austin, explains what a quantum computer is, various possible applications, the types of problems they are good at solving and much more. Kyle and Scott have a lively discussion about the capabilities and limits of quantum computers and computational complexity. ... [more]

In this episode we discuss the complexity class of EXP-Time which contains algorithms which require $O(2^{p(n)})$ time to run. In other words, the worst case runtime is exponential in some polynomial of the input size. Problems in this class are even more difficult than problems in NP since you can\'t even verify a solution in polynomial time. We mostly discuss Generalized Chess as an intuitive example of a problem in EXP-Time. Another well-known problem is determining if a given algorithm will halt in k steps. That extra condition of restricting it to k steps makes this problem distinct from Turing\'s original definition of the halting problem which is known to be intractable. ... [more]

In this week\'s episode, host Kyle Polich interviews author Lance Fortnow about whether P will ever be equal to NP and solve all of life’s problems. Fortnow begins the discussion with the example question: Are there 100 people on Facebook who are all friends with each other? Even if you were an employee of Facebook and had access to all its data, answering this question naively would require checking more possibilities than any computer, now or in the future, could possibly do. The P/NP question asks whether there exists a more clever and faster algorithm that can answer this problem and others like it. ... [more]

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size. The class P contains all algorithms which run in polynomial time (basically, a nested for loop iterating over the input). NP are algorithms which seem to require brute force. Brute force search cannot be done in polynomial time, so it seems that problems in NP are more difficult than problems in P. I say it "seems" this way because, while most people believe it to be true, it has not been proven. This is the famous P vs. NP conjecture. It will be discussed in more detail in a future episode. Given a solution to a particular problem, if it can be verified/checked in polynomial time, that problem might be in NP. If someone hands you a completed Sudoku puzzle, it\'s not difficult to see if they made any mistakes. The effort of developing the solution to the Sudoku game seems to be intrinsically more difficult. In fact, as far as anyone knows, in the general case of all possible examples of the game, it seems no strategy can do better on average than just random guessing. This notion of random guessing the solution is where the N in NP comes from: Non-deterministic. Imagine a machine with a random input already written in its memory. Given enough such machines, one of them will have the right answer. If they all ran in parallel, one of them could verify it\'s input in polynomial time. This guess / provided input is often called a witness string. NP is an important concept for many reasons. To me, the most reason to know about NP is a practical one. Depending on your goals or the goals of your employer, there are many challenging problems you may attempt to solve. If a problem you are trying to solve happens to be in NP, then you should consider the implications very carefully. Perhaps you\'ll be lucky and discover that your particular instance of the problem is easy. Sudoku is pretty easy if only 2 remaining squares need to be filled in. The traveling salesman problem is easy to solve if you live in a country where all roads for a ring with exactly one road in and out. If the problem you wish to solve is not trivial, or if you will face many instances of the problem and expect some will not be trivial, then it\'s unlikely you\'ll be able to find the exact solution. Sure, maybe you can grab a bunch of commodity servers and try to scale the heck out of your attempt. Depending on the problem you\'re solving, that might just work. If you can out-purchase your problem in computing power, then problems in NP will surrender to you. But if your input size ever grows, it\'s unlikely you\'ll be able to keep up. If your problem is intractable in this way, all is not lost. You might be able to find an approximate solution to your problem. Good enough is better than no solution at all, right? Most of the time, probably. However, some tremendous work has also been done studying topics like this. Are there problems which are not even approximable in polynomial time? What approximation techniques work best? Alas, those answers lie elsewhere. This episode avoids a discussion of a few key points in order to keep the material accessible. If you find this interesting, you should next familiarize yourself with the notions of NP-Complete, NP-Hard, and co-NP. These are topics we won\'t necessarily get to in future episodes. Michael Sipser\'s Introduction to the Theory of Computation is a good resource. ... [more]

In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of machine learning, complexity in game theory, and algorithmic fairness. Michael\'s doctoral thesis gave an early broad overview of computational learning theory, in which he emphasizes the mathematical study of efficient learning algorithms by machines or computational systems. When we look at machine learning algorithms they are almost like meta-algorithms in some sense. For example, given a machine learning algorithm, it will look at some data and build some model, and it’s going to behave presumably very differently under different inputs. But does that mean we need new analytical tools? Or is a machine learning algorithm just the same thing as any deterministic algorithm, but just a little bit more tricky to figure out anything complexity-wise? In other words, is there some overlap between the good old-fashioned analysis of algorithms with the analysis of machine learning algorithms from a complexity viewpoint? And what is the difference between strategies for determining the complexity bounds on samples versus algorithms? A big area of machine learning (and in the analysis of learning algorithms in general) Michael and Kyle discuss is the topic known as complexity regularization. Complexity regularization asks: How should one measure the goodness of fit and the complexity of a given model? And how should one balance those two, and how can one execute that in a scalable, efficient way algorithmically? From this, Michael and Kyle discuss the broader picture of why one should care whether a learning algorithm is efficiently learnable if it\'s learnable in polynomial time. Another interesting topic of discussion is the difference between sample complexity and computational complexity. An active area of research is how one should regularize their models so that they\'re balancing the complexity with the goodness of fit to fit their large training sample size. As mentioned, a good resource for getting started with correlated equilibria is: https://www.cs.cornell.edu/courses/cs684/2004sp/feb20.pdf Thanks to our sponsors: Mendoza College of Business - Get your Masters of Science in Business Analytics from Notre Dame. brilliant.org - A fun, affordable, online learning tool. Check out their Computer Science Algorithms course. ... [more]

How long an algorithm takes to run depends on many factors including implementation details and hardware. However, the formal analysis of algorithms focuses on how they will perform in the worst case as the input size grows. We refer to an algorithm\'s runtime as it\'s "O" which is a function of its input size "n". For example, O(n) represents a linear algorithm - one that takes roughly twice as long to run if you double the input size. In this episode, we discuss a few everyday examples of algorithmic analysis including sorting, search a shuffled deck of cards, and verifying if a grocery list was successfully completed. Thanks to our sponsor Brilliant.org, who right now is featuring a related problem as their Brilliant Problem of the Week. ... [more]

In this episode, Microsoft\'s Corporate Vice President for Cloud Artificial Intelligence, Joseph Sirosh, joins host Kyle Polich to share some of the Microsoft\'s latest and most exciting innovations in AI development platforms. Last month, Microsoft launched a set of three powerful new capabilities in Azure Machine Learning for advanced developers to exploit big data, GPUs, data wrangling and container-based model deployment. Extended show notes found here. Thanks to our sponsor Springboard. Check out Springboard\'s Data Science Career Track Bootcamp. ... [more]

Last year, the film development and production company End Cue produced a short film, called Sunspring, that was entirely written by an artificial intelligence using neural networks. More specifically, it was authored by a recurrent neural network (RNN) called long short-term memory (LSTM). According to End Cue’s Chief Technical Officer, Deb Ray, the company has come a long way in improving the generative AI aspect of the bot. In this episode, Deb Ray joins host Kyle Polich to discuss how generative AI models are being applied in creative processes, such as screenwriting. Their discussion also explores how data science for analyzing development projects, such as financing and selecting scripts, as well as optimizing the content production process. ... [more]

Over the past several years, we have seen many success stories in machine learning brought about by deep learning techniques. While the practical success of deep learning has been phenomenal, the formal guarantees have been lacking. Our current theoretical understanding of the many techniques that are central to the current ongoing big-data revolution is far from being sufficient for rigorous analysis, at best. In this episode of Data Skeptic, our host Kyle Polich welcomes guest John Wilmes, a mathematics post-doctoral researcher at Georgia Tech, to discuss the efficiency of neural network learning through complexity theory. ... [more]

TMs are a model of computation at the heart of algorithmic analysis. A Turing Machine has two components. An infinitely long piece of tape (memory) with re-writable squares and a read/write head which is programmed to change it\'s state as it processes the input. This exceptionally simple mechanical computer can compute anything that is intuitively computable, thus says the Church-Turing Thesis. Attempts to make a "better" Turing Machine by adding things like additional tapes can make the programs easier to describe, but it can\'t make the "better" machine more capable. It won\'t be able to solve any problems the basic Turing Machine can, even if it perhaps solves them faster. An important concept we didn\'t get to in this episode is that of a Universal Turing Machine. Without the prefix, a TM is a particular algorithm. A Universal TM is a machine that takes, as input, a description of a TM and an input to that machine, and subsequently, simulates the inputted machine running on the given input. Turing Machines are a central idea in computer science. They are central to algorithmic analysis and the theory of computation. ... [more]

Our guest Pranav Rajpurkar and his coauthored recently published Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks, a paper in which they demonstrate the use of Convolutional Neural Networks which outperform board certified cardiologists in detecting a wide range of heart arrhythmias from ECG data. ... [more]

Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An LSTM unit remembers values for either long or short time periods. The key to this ability is that it uses no activation function within its recurrent components. Thus, the stored value is not iteratively modified and the gradient does not tend to vanish when trained with backpropagation through time. ... [more]

One Shot Learning is the class of machine learning procedures that focuses learning something from a small number of examples. This is in contrast to "traditional" machine learning which typically requires a very large training set to build a reasonable model. In this episode, Kyle presents a coded message to Linhda who is able to recognize that many of these new symbols created are likely to be the same symbol, despite having extremely few examples of each. Why can the human brain recognize a new symbol with relative ease while most machine learning algorithms require large training data? We discuss some of the reasons why and approaches to One Shot Learning. ... [more]

Recommender systems play an important role in providing personalized content to online users. Yet, typical data mining techniques are not well suited for the unique challenges that recommender systems face. In this episode, host Kyle Polich joins Dr. Joseph Konstan from the University of Minnesota at a live recording at FARCON 2017 in Minneapolis to discuss recommender systems and how machine learning can create better user experiences. ... [more]

Zillow is a leading real estate information and home-related marketplace. We interviewed Andrew Martin, a data science Research Manager at Zillow, to learn more about how Zillow uses data science and big data to make real estate predictions. ... [more]

A Bayesian Belief Network is an acyclic directed graph composed of nodes that represent random variables and edges that imply a conditional dependence between them. It\'s an intuitive way of encoding your statistical knowledge about a system and is efficient to propagate belief updates throughout the network when new information is added. ... [more]

In this episode, Tony Beltramelli of UIzard Technologies joins our host, Kyle Polich, to talk about the ideas behind his latest app that can transform graphic design into functioning code, as well as his previous work on spying with wearables. ... [more]

Thanks to our sponsor Springboard. In this week\'s episode, guest Andre Natal from Mozilla joins our host, Kyle Polich, to discuss a couple exciting new developments in open source speech recognition systems, which include Project Common Voice. In June 2017, Mozilla launched a new open source project, Common Voice, a novel complementary project to the TensorFlow-based DeepSpeech implementation. DeepSpeech is a deep learning-based voice recognition system that was designed by Baidu, which they describe in greater detail in their research paper. DeepSpeech is a speech-to-text engine, and Mozilla hopes that, in the future, they can use Common Voice data to train their DeepSpeech engine. ... [more]

RNNs are a class of deep learning models designed to capture sequential behavior. An RNN trains a set of weights which depend not just on new input but also on the previous state of the neural network. This directed cycle allows the training phase to find solutions which rely on the state at a previous time, thus giving the network a form of memory. RNNs have been used effectively in language analysis, translation, speech recognition, and many other tasks. ... [more]

In statistics, two random variables might depend on one another (for example, interest rates and new home purchases). We call this conditional dependence. An important related concept exists called conditional independence. This phrase describes situations in which two variables are independent of one another given some other variable. For example, the probability that a vendor will pay their bill on time could depend on many factors such as the company\'s market cap. Thus, a statistical analysis would reveal many relationships between observable details about the company and their propensity for paying on time. However, if you know that the company has filed for bankruptcy, then we might assume their chances of paying on time have dropped to near 0, and the result is now independent of all other factors in light of this new information. We discuss a few real world analogies to this idea in the context of some chance meetings on our recent trip to New York City. ... [more]

This episode collects interviews from my recent trip to Microsoft Build where I had the opportunity to speak with Dharma Shukla and Syam Nair about the recently announced CosmosDB. CosmosDB is a globally consistent, distributed datastore that supports all the popular persistent storage formats (relational, key/value pair, document database, and graph) under a single streamlined API. The system provides tunable consistency, allowing the user to make choices about how consistency trade-offs are managed under the hood, if a consumer wants to go beyond the selected defaults. ... [more]

Animals can\'t tell us when they\'re experiencing pain, so we have to rely on other cues to help treat their discomfort. But it is often difficult to tell how much an animal is suffering. The sheep, for instance, is the most inscrutable of animals. However, scientists have figured out a way to understand sheep facial expressions using artificial intelligence. On this week\'s episode, Dr. Marwa Mahmoud from the University of Cambridge joins us to discuss her recent study, "Estimating Sheep Pain Level Using Facial Action Unit Detection." Marwa and her colleague\'s at Cambridge\'s Computer Laboratory developed an automated system using machine learning algorithms to detect and assess when a sheep is in pain. We discuss some details of her work, how she became interested in studying sheep facial expression to measure pain, and her future goals for this project. If you\'re able to be in Minneapolis, MN on August 23rd or 24th, consider attending Farcon. Get your tickets today via https://farcon2017.eventbrite.com. ... [more]

This episode discusses the vanishing gradient - a problem that arises when training deep neural networks in which nearly all the gradients are very close to zero by the time back-propagation has reached the first hidden layer. This makes learning virtually impossible without some clever trick or improved methodology to help earlier layers begin to learn. ... [more]

In a neural network, the output value of a neuron is almost always transformed in some way using a function. A trivial choice would be a linear transformation which can only scale the data. However, other transformations, like a step function allow for non-linear properties to be introduced. Activation functions can also help to standardize your data between layers. Some functions such as the sigmoid have the effect of "focusing" the area of interest on data. Extreme values are placed close together, while values near it\'s point of inflection change more quickly with respect to small changes in the input. Similarly, these functions can take any real number and map all of them to a finite range such as [0, 1] which can have many advantages for downstream calculation. In this episode, we overview the concept and discuss a few reasons why you might select one function verse another. ... [more]

hen faced with medical issues, would you want to be seen by a human or a machine? In this episode, guest Edward Choi, co-author of the study titled Doctor AI: Predicting Clinical Events via Recurrent Neural Network shares his thoughts. Edward presents his team’s efforts in developing a temporal model that can learn from human doctors based on their collective knowledge, i.e. the large amount of Electronic Health Record (EHR) data. ... [more]

Max-pooling is a procedure in a neural network which has several benefits. It performs dimensionality reduction by taking a collection of neurons and reducing them to a single value for future layers to receive as input. It can also prevent overfitting, since it takes a large set of inputs and admits only one value, making it harder to memorize the input. In this episode, we discuss the intuitive interpretation of max-pooling and why it\'s more common than mean-pooling or (theoretically) quartile-pooling. ... [more]

This episode recaps the Microsoft Build Conference. Kyle recently attended and shares some thoughts on cloud, databases, cognitive services, and artificial intelligence. The episode includes interviews with Rohan Kumar and David Carmona. ... [more]

CNNs are characterized by their use of a group of neurons typically referred to as a filter or kernel. In image recognition, this kernel is repeated over the entire image. In this way, CNNs may achieve the property of translational invariance - once trained to recognize certain things, changing the position of that thing in an image should not disrupt the CNN\'s ability to recognize it. In this episode, we discuss a few high-level details of this important architecture. ... [more]

GANs are an unsupervised learning method involving two neural networks iteratively competing. The discriminator is a typical learning system. It attempts to develop the ability to recognize members of a certain class, such as all photos which have birds in them. The generator attempts to create false examples which the discriminator incorrectly classifies. In successive training rounds, the networks examine each and play a mini-max game of trying to harm the performance of the other. In addition to being a useful way of training networks in the absence of a large body of labeled data, there are additional benefits. The discriminator may end up learning more about edge cases than it otherwise would be given typical examples. Also, the generator\'s false images can be novel and interesting on their own. The concept was first introduced in the paper Generative Adversarial Networks. ... [more]

Despite the success of GANs in imaging, one of its major drawbacks is the problem of \'mode collapse,\' where the generator learns to produce samples with extremely low variety. To address this issue, today\'s guests Arnab Ghosh and Viveka Kulharia proposed two different extensions. The first involves tweaking the generator\'s objective function with a diversity enforcing term that would assess similarities between the different samples generated by different generators. The second comprises modifying the discriminator objective function, pushing generations corresponding to different generators towards different identifiable modes. ... [more]

Recently, we\'ve seen opinion polls come under some skepticism. But is that skepticism truly justified? The recent Brexit referendum and US 2016 Presidential Election are examples where some claims the polls "got it wrong". This episode explores this idea. ... [more]

This episode is an interview with Tinghui Zhou. In the recent paper "Unsupervised Learning of Depth and Ego-motion from Video", Tinghui and collaborators propose a deep learning architecture which is able to learn depth and pose information from unlabeled videos. We discuss details of this project and its applications. ... [more]

Backpropagation is a common algorithm for training a neural network. It works by computing the gradient of each weight with respect to the overall error, and using stochastic gradient descent to iteratively fine tune the weights of the network. In this episode, we compare this concept to finding a location on a map, marble maze games, and golf. ... [more]

In this week\'s episode of Data Skeptic, host Kyle Polich talks with guest Maura Church, Patreon\'s data science manager. Patreon is a fast-growing crowdfunding platform that allows artists and creators of all kinds build their own subscription content service. The platform allows fans to become patrons of their favorite artists- an idea similar the Renaissance times, when musicians would rely on benefactors to become their patrons so they could make more art. At Patreon, Maura\'s data science team strives to provide creators with insight, information, and tools, so that creators can focus on what they do best-- making art. On the show, Maura talks about some of her projects with the data science team at Patreon. Among the several topics discussed during the episode include: optical music recognition (OMR) to translate musical scores to electronic format, network analysis to understand the connection between creators and patrons, growth forecasting and modeling in a new market, and churn modeling to determine predictors of long time support. A more detailed explanation of Patreon\'s A/B testing framework can be found here Other useful links to topics mentioned during the show: OMR research Patreon blog Patreon HQ blog Amanda Palmer Fran Meneses ... [more]

Feed Forward Neural Networks In a feed forward neural network, neurons cannot form a cycle. In this episode, we explore how such a network would be able to represent three common logical operators: OR, AND, and XOR. The XOR operation is the interesting case. Below are the truth tables that describe each of these functions. AND Truth Table Input 1 Input 2 Output 0 0 0 0 1 0 1 0 0 1 1 1 OR Truth Table Input 1 Input 2 Output 0 0 0 0 1 1 1 0 1 1 1 1 XOR Truth Table Input 1 Input 2 Output 0 0 0 0 1 1 1 0 1 1 1 0 The AND and OR functions should seem very intuitive. Exclusive or (XOR) if true if and only if exactly single input is 1. Could a neural network learn these mathematical functions? Let\'s consider the perceptron described below. First we see the visual representation, then the Activation function , followed by the formula for calculating the output. Can this perceptron learn the AND function? Sure. Let and What about OR? Yup. Let and An infinite number of possible solutions exist, I just picked values that hopefully seem intuitive. This is also a good example of why the bias term is important. Without it, the AND function could not be represented. How about XOR? No. It is not possible to represent XOR with a single layer. It requires two layers. The image below shows how it could be done with two laters. In the above example, the weights computed for the middle hidden node capture the essence of why this works. This node activates when recieving two positive inputs, thus contributing a heavy penalty to be summed by the output node. If a single input is 1, this node will not activate. Universal approximation theorem tells us that any continuous function can be tightly approximated using a neural network with only a single hidden layer and a finite number of neurons. With this in mind, a feed forward neural network should be adaquet for any applications. However, in practice, other network architectures and the allowance of more hidden layers are empirically motivated. Other types neural networks have less strict structal definitions. The various ways one might relax this constraint generate other classes of neural networks that often have interesting properties. We\'ll get into some of these in future mini-episodes. Check out our recent blog post on how we\'re using Periscope Data cohort charts. Thanks to Periscope Data for sponsoring this episode. More about them at periscopedata.com/skeptics ... [more]

There\'s more than one type of computer processor. The central processing unit (CPU) is typically what one means when they say "processor". GPUs were introduced to be highly optimized for doing floating point computations in parallel. These types of operations were very useful for high end video games, but as it turns out, those same processors are extremely useful for machine learning. In this mini-episode we discuss why. ... [more]

No reliable, complete database cataloging home sales data at a transaction level is available for the average person to access. To a data scientist interesting in studying this data, our hands are complete tied. Opportunities like testing sociological theories, exploring economic impacts, study market forces, or simply research the value of an investment when buying a home are all blocked by the lack of easy access to this dataset. OpenHouse seeks to correct that by centralizing and standardizing all publicly available home sales transactional data. In this episode, we discuss the achievements of OpenHouse to date, and what plans exist for the future. Check out the OpenHouse gallery. I also encourage everyone to check out the project Zareen mentioned which was her Harry Potter word2vec webapp and Joy\'s project doing data visualization on Jawbone data. Guests Thanks again to @iamzareenf, @blueplastic, and @joytafty for coming on the show. Thanks to the numerous other volunteers who have helped with the project as well! Announcements and details If you\'re interested in getting involved in OpenHouse, check out the OpenHouse contributor\'s quickstart page. Kyle is giving a machine learning talk in Los Angeles on May 25th, 2017 at Zehr. Sponsor Thanks to our sponsor for this episode Periscope Data. The blog post demoing their maps option is on our blog titled Periscope Data Maps. To start a free trial of their dashboarding too, visit http://periscopedata.com/skeptics Kyle recently did a youtube video exploring the Data Skeptic podcast download numbers using Periscope Data. Check it out at https://youtu.be/aglpJrMp0M4. Supplemental music is Lee Rosevere\'s Let\'s Start at the Beginning. ... [more]

If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process observed in deep learning, where each layer of the business reports up different types of observations, KPIs, and reports to be interpreted by the next layer of the business. In deep learning, this process can be thought of as automated feature engineering. DNNs built to recognize objects in images may learn structures that behave like edge detectors in the first hidden layer. Proceeding layers learn to compose more abstract features from lower level outputs. This episode explore that analogy in the context of automated feature engineering. Linh Da and Kyle discuss a particular image in this episode. The image included below in the show notes is drawn from the work of Lee, Grosse, Ranganath, and Ng in their paper Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. ... [more]

DataRefuge is a public collaborative, grassroots effort around the United States in which scientists, researchers, computer scientists, librarians and other volunteers are working to download, save, and re-upload government data. The DataRefuge Project, which is led by the UPenn Program in Environmental Humanities and the Penn Libraries group at University of Pennsylvania, aims to foster resilience in an era of anthropogenic global climate change and raise awareness of how social and political events affect transparency. ... [more]

In this Data Skeptic episode, Kyle is joined by guest Ruggiero Cavallo to discuss his latest efforts to mitigate the problems presented in this new world of online advertising. Working with his collaborators, Ruggiero reconsiders the search ad allocation and pricing problems from the ground up and redesigns a search ad selling system. He discusses a mechanism that optimizes an entire page of ads globally based on efficiency-maximizing search allocation and a novel technical approach to computing prices. ... [more]

Today\'s episode overviews the perceptron algorithm. This rather simple approach is characterized by a few particular features. It updates its weights after seeing every example, rather than as a batch. It uses a step function as an activation function. It\'s only appropriate for linearly separable data, and it will converge to a solution if the data meets these criteria. Being a fairly simple algorithm, it can run very efficiently. Although we don\'t discuss it in this episode, multi-layer perceptron networks are what makes this technique most attractive. ... [more]

In this episode, I speak with Raghu Ramakrishnan, CTO for Data at Microsoft. We discuss services, tools, and developments in the big data sphere as well as the underlying needs that drove these innovations. ... [more]

Versioning isn\'t just for source code. Being able to track changes to data is critical for answering questions about data provenance, quality, and reproducibility. Daniel Whitenack joins me this week to talk about these concepts and share his work on Pachyderm. Pachyderm is an open source containerized data lake. During the show, Daniel mentioned the Gopher Data Science github repo as a great resource for any data scientists interested in the Go language. Although we didn\'t mention it, Daniel also did an interesting analysis on the 2016 world chess championship that complements our recent episode on chess well. You can find that post here Supplemental music is Lee Rosevere\'s Let\'s Start at the Beginning. Thanks to Periscope Data for sponsoring this episode. More about them at periscopedata.com/skeptics ... [more]

In this episode, we talk about a high-level description of deep learning. Kyle presents a simple game (pictured below), which is more of a puzzle really, to try and give Linh Da the basic concept. Thanks to our sponsor for this week, the Data Science Association. Please check out their upcoming Dallas conference at dallasdatascience.eventbrite.com ... [more]

Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode. Keep an eye on the dataskeptic.com blog this week as we post more details about this project. Thanks to our sponsor this week, the Data Science Association. Please check out their upcoming conference in Dallas on Saturday, February 18th, 2017 via the link below. dallasdatascience.eventbrite.com ... [more]

Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfitting is to use dropout. Dropout is the method of randomly selecting some neurons in one\'s network to set to zero during iterations of learning. The core idea is that each particular input in a given layer is not always available and therefore not a signal that can be relied on too heavily. ... [more]

In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House\'s Police Data Initiative and Data Driven Justice Initiative respectively. The Police Data Initiative was organized to use open data to increase transparency and community trust as well as to help police agencies use data for internal accountability. The PDI emerged from recommendations made by the Task Force on 21st Century Policing. The Data Driven Justice Initiative was organized to help city, county, and state governments use data-driven strategies to help low-level offenders with mental illness get directed to the right services rather than into the criminal justice system. ... [more]

Prior work has shown that people\'s response to competition is in part predicted by their gender. Understanding why and when this occurs is important in areas such as labor market outcomes. A well structured study is challenging due to numerous confounding factors. Peter Backus and his colleagues have identified competitive chess as an ideal arena to study the topic. Find out why and what conclusions they reached. Our discussion centers around Gender, Competition and Performance: Evidence from Real Tournaments from Backus, Cubel, Guid, Sanchez-Pages, and Mañas. A summary of their paper can also be found here. ... [more]