We are joined by Maximilian Mozes, a PhD student at the University College, London. His PhD research focuses on Natural Language Processing (NLP), particularly the intersection of adversarial machine learning and NLP. He joins us to discuss his latest research, Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities.
Maximilian started by giving examples of how NLPs can be used for nefarious purposes. He discussed how LLMs can generate personalized phishing emails or write malicious code at scale. He also shared the use of LLMs for fake news.
Maximilian shared some preventive measures that can be taken in the face of LLM misuse. He discussed the role of AI safety in mitigating these misuses. He shared how red teaming helps and how to set up a red team. Maximilian highlighted the challenges in using RLHF to improve the harmlessness of LLMs.
Maximilian discusses how data poisoning can be a potential threat to LLMs. He also discussed jailbreaking for making LLMs perform insidious tasks. He shared two approaches that can be used against jailbreaking.
Rounding up, he discussed the future outlook of LLMs. Follow Maximilian on X @maximilianmozes.
Maximilian Mozes is a final-year PhD student at University College London supervised by Lewis Griffin (University College London) and Bennett Kleinberg (Tilburg University and University College London). His PhD research focuses on the intersection of adversarial machine learning and natural language processing, as well as safety- and security-related aspects of large language models. Prior to that, Maximilian completed his undergraduate studies at the Technical University of Munich.