Data Skeptic: Journal Club

A panel discussion about news and academic papers related to data science.

Recurring Panel


Kyle

Lan

George


Listen on Google Play Music


Chip Design, Teaching Google, and Fooling LIME and SHAP

This weeks episode we have the regular panel back together! George brought us the blog post from Google AI, "Chip Design with Deep Reinforcement Learning." Kyle brings us a news item from CNET, "How People with Down Syndrome are Improving Google Assistant." Lan brings us the paper this week! She discusses the paper "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods." All works mentioned will be linked in the show notes.  ... [more]


Chemical Space, AI Microscope, and Panda or Gibbon?

George talks about OpenAI's Microscope, a collection of visualisations of the neurons and layers in 6 famous vision models. This library hopes to make analysis of these models a community effort.  Lan talks about Exploring chemical space with AI and how that may change pharmaceutical drug discovery and development.  Kyle leads a discussion about the paper "Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions" which shows another control that an adversarial attacker can put in place to better fool machine learning models. ... [more]


Encryption Keys, Connect Four, and Data Nutrition Labels

Today George takes inspiration and the gym environment from Kaggle's ConnectX competition and shows off and attempt to design an interpretable Connect 4 Agent with DQN! Lan discusses the paper "The Dataset Nutrition Label," which is a framework to facilitate higher data quality standards by Sarah Holland and co-authors, from the Assembly program at the Berkman Klein Center at Harvard University & MIT Media Lab. Last but not least, Kyles leads the panel in a discussion about encryption keys!     Lan discusses Dataset nutrition Label Kyle discusses encryption keys   ... [more]


ML Cancer Diagnosis, Robot Assistants, and Watermarking Data

Today George talks about the use of Machine Learning to diagnose Cancer from a blood test. By sampling 'cell-free-DNA' this test is capable of identifying 50 different types of Cancer and the localized tissue of origin with a >90% accuracy. Lan leads a discussion of what robots and researchers in robotics may be able to contribute towards fighting the COVID-19 pandemic. Last but not least, Kyle leads the panel in a discussion about watermarking data!    ... [more]


Tools For Misusing GPT2, Tensorflow, and ML Unfairness

Today on the show, George leads a discussion about the Giant Language Test Room.  Lan presents a news item about Setting Fairness Goals with TensorFlow Constrained Optimization Library. This library lets users configure and train machine learning problems based on multiple different metrics, making it easy to formulate and solve many problems of interest to the fairness community. Last but not least, Kyle discusses ML Unfairness, Juvenile Recidivism in Catalonia. ... [more]


Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

Today on the show, Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT?  George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Last but not least, Kyle discusses the work "Learning Important Features Through Propagating Activation Differences." ... [more]


Dopamine, Deep Q Networks, and Hey Alexa!

Today on the show, Lan presents a blog post from Google Deepmind about Dopamine and temporal difference learning. This is the story of a fruitful collaboration between Neuroscience and AI researchers that found the activity of dopamine neurons in the mouse ventral tegmental area during a learnt probabilistic reward task was consistent with distributional temporal-difference reinforcement learning. That's a mouthful, go read it yourself! George presents his first attempts at designing an Auto-Trading Agent with Deep Q Networks. Last but not least, Kyle says "Hey Alexa! Sorry I fooled you ..."     ... [more]


Google's New Data Engine, Activation Atlas, and LIME

George discusses Google's Dataset Search leaving its closed beta program, and what potential applications it will have for businesses, scholars, and hobbyists. Alex brings an article about Activation Atlases and we discusses the applicability to machine learning interpretability. Lan leads a discussion about the paper Attention is not Explanation from Sarthak Jain and Byron C. Wallace. It explores the relationship between attention weights and feature importance scores (spoilers in the title). Kyle shamelessly promotes his blog post using LIME to explain a simple prediction model trained on Wikipedia data. ... [more]


Albert, Seinfeld, and Explainable AI

Kyle discusses Google's recent open sourcing of ALBERT, a variant of the famous BERT model for natural language processing. ALBERT is more compact and uses fewer parameters.  George leads a discussion about the paper Explainable Artificial Intelligence: Understanding, visualizing, and interpreting deep learning models by Samek, Wiegand, and Muller. This work introduces two tools for generating local interpretability and a novel metric to objectively compare the quality of explanations. Last but not least, Lan talks about her experience generating new Seinfeld scripts using GPT-2. ... [more]


Chess Transformer, Kaggle Scandal, and Interpretability Zoo

Welcome to a brand new show from Data Skeptic entitled "Journal Club". Each episode will feature a regular panel and one revolving guest seat. The group will discuss a few topics related to data science and focus on one featured scholarly paper which is discussed in detail. Lan tells the story of a transformer learning to play chess. The experiment was to fine-tune a GPT-2 transformer model using a 2.4M corpus of chess games in standard notation, then to see if it can 'play chess' by generating the next move. This is a thought-provoking way to take advantage of the advances in NLP by 'transforming' a game into the 'language' of written text. This was work done by Shawn Presser. George gives a breakdown of a Kaggle Cheating Scandal where a Grandmaster was caught training on the test set. The story follows Benjamin Minixhofer and his capable detective work to discover an obfuscation that artificially improved the winning team's accuracy. Kyle leads a discussion on the paper Towards A Rigorous Science of Interpretable Machine Learning from Finale Doshi-Velez and Been Kim. The paper is a great survey of the spectrum of interpretability techniques and also contains suggestions for how we describe the "taxonomy" of various methodologies. ... [more]


Chess Transformer, Kaggle Scandal, and Interpretability Zoo

... [more]