Data Skeptic: Journal Club

A panel discussion about news and academic papers related to data science.

Recurring Panel




Listen on Google Play Music

Chemical Space, AI Microscope, and Panda or Gibbon?

... [more]

Encryption Keys, Connect Four, and Data Nutrition Labels

Taking inspiration and the gym environment from Kaggle's ConnectX competition, George shows off an attempt to design an interpretable Connect 4 Agent with a DQN. Lan discusses Dataset nutrition Label Kyle discusses encryption keys   ... [more]

ML Cancer Diagnosis, Robot Assistants, and Watermarking Data

George talks about the use of Machine Learning to diagnose Cancer from a blood test. By sampling 'cell-free-DNA' this test is capable of identifying 50 different types of Cancer and the localized tissue of origin with a >90% accuracy. Lan leads a discussion of what robots and researchers in robotics may be able to contribute towards fighting the COVID-19 pandemic. Kyle talks about watermarking data.  ... [more]

Tools For Misusing GPT2, Tensorflow, and ML Unfairness

George discusses Giant Language Test Room Lan presents a news item about Setting Fairness Goals with TensorFlow Constrained Optimization Library. This library lets users configure and train machine learning problems based on multiple different metrics, making it easy to formulate and solve many problems of interest to the fairness community. Kyle discusses ML Unfairness, Juvenile Recidivism in Catalonia ... [more]

Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT? George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Kyle: Learning Important Features Through Propagating Activation Differences ... [more]

Dopamine, Deep Q Networks, and Hey Alexa!

Journal Club Lan presents a blog post from Google Deepmind about Dopamine and temporal difference learning. This is the story of a fruitful collaboration between Neuroscience and AI researchers that found the activity of dopamine neurons in the mouse ventral tegmental area during a learnt probablistic reward task was consistent with distributional temporal-difference reinforcement learning. That's a mouthful, go read it yourself! Kyle: Hey Alexa! Sorry I fooled you ... George presents his first attempts at designing an Auto-Trading Agent with Deep Q Networks. ... [more]

Google\'s New Data Engine, Activation Atlas, and LIME

George discusses Google's Dataset Search leaving its closed beta program, and what potential applications it will have for businesses, scholars, and hobbyists. Alex brings an article about Activation Atlases and we discusses the applicability to machine learning interpretability. Lan leads a discussion about the paper Attention is not Explanation from Sarthak Jain and Byron C. Wallace. It explores the relationship between attention weights and feature importance scores (spoilers in the title). Kyle shamelessly promotes his blog post using LIME to explain a simple prediction model trained on Wikipedia data. ... [more]

Albert, Seinfeld, and Explainable AI

Kyle discusses Google's recent open sourcing of ALBERT, a variant of the famous BERT model for natural language processing. ALBERT is more compact and uses fewer parameters. George leads a discussion about the paper Explainable Artificial Intelligence: Understanding, visualizing, and interpreting deep learning models by Samek, Wiegand, and Muller. This work introduces two tools for generating local interpretability and a novel metric to objectively compare the quality of explanations. Lan talks about her experience generating new Seinfeld scripts using GPT-2. ... [more]

Chess Transformer, Kaggle Scandal, and Interpretability Zoo

... [more]

Chess Transformer, Kaggle Scandal, and Interpretability Zoo

... [more]