The creators of large language models impose restrictions on some of the types of requests one might make of them. LLMs commonly refuse to give advice on committing crimes, producting adult content, or respond with any details about a variety of sensitive subjects. As with any content filtering system, you have false positives and false negatives.
Today’s interview with Max Reuter and William Schulze discusses their paper I’m Afraid I Can’t Do That: Predicting Prompt Refusal in Black-Box Generative Language Models. In this work, they explore what types of prompts get refused and build a machine learning classifier adept at predicting if a particular prompt will be refused or not.
Max Reuter is a Master’s student at Michigan State University, specializing in artificial intelligence and cognitive science. Prior to joining Michigan State, he worked as a research assistant at Brown University where he helped analyze the visual capabilities of deep reinforcement learning models. Before that, he worked in IBM's research divisions of artificial intelligence and quantum computing. He holds a degree in computer science from Michigan State.
William Schulze is a computer science researcher at Michigan State University. His research interests include the way that civil society is being formed by society-scale software systems. Previously, William was a system engineer and software lead for NASA at the Jet Propulsion Laboratory, working in the field of spacecraft navigation.