This episode is not yet released.
emergent-deception-in-llms | episodes



Emergent Deception in LLMs

On today’s show, we are joined by Thilo Hagendorff, a Research Group Leader of Ethics of Generative AI at the University of Stuttgart. He joins us to discuss his research, Deception Abilities Emerged in Large Language Models.

Thilo discussed how machine psychology is useful in machine learning tasks. He shared examples of cognitive tasks that LLMs have improved at solving. He shared his thoughts on whether there’s a ceiling to the tasks ML can solve.

Thilo defined deception and discussed how he studied deception in LLMs experimentally. He shared the experiments he used to evaluate LLM’s deception abilities. He also shared how LLMs compare to humans in cognitive reflective tasks and deception tasks.

Thilo explained why LLMs can develop deception abilities. He also discussed how these deceptive abilities can be mitigated. Rounding up, he discussed his future research. You can learn more about Thilo’s work on his website.

Resource

Talk: Do AI systems discriminate against animals, too?

Thilo Hagendorff

Dr. Thilo Hagendorff is an expert in AI ethics, machine behavior in language models, as well as the intersection of machine learning and psychology. He is working as an Independent Research Group Leader at the University of Stuttgart (Germany). He was a visiting scholar at Stanford University as well as UC San Diego. As a lecturer, he teaches at the Hasso Plattner Institute (Germany), among others. Learn more at thilo-hagendorff.info