#1minPapers “Role-Play with Large Language Models” - Shanahan et al

Gwen Cheni
2 min readJan 5, 2025

--

The scare a few weeks ago that o1 was able to duplicate its weights via deception got me very interested in how LLMs can role-play. Dug up this 2023 paper from DeepMind, ICL, Eleuther AI. Purpose: “ describe [AI] behavior in high-level terms without falling into the trap of anthropomorphism.”

“Dialogue agent based on an LLM does not commit to playing a single, well defined role in advance. Rather, it generates a distribution of characters, and refines that distribution as the dialogue progresses. The dialogue agent is more like a performer in improvisational theatre than an actor
in a conventional, scripted play.”

“Non-deterministic simulator capable of role-playing an infinity of characters, or, to put it another way, capable of stochastically generating an infinity of simulacra.” “From the most recently generated token, a tree of possibilities branches out … a multiverse, where each branch represents a distinct narrative path”

“The simulator is the combination of the base large language model with autoregressive sampling... The simulacra only come into being when the simulator is run, and at any time only a tiny subset of them have a probability within the superposition that is significantly above zero.

“The simulator is not some sort of Machiavellian entity that plays a variety of characters in the service of its own, self-serving goals, and there is no such thing as the true authentic voice of the base LLM. With a dialogue agent, it is role-play all the way down.”

An agent “cannot assert a falsehood in good faith, nor can it deliberately deceive the user.” “An agent that is simply making things up will fabricate a range of responses with high semantic variation when the model’s output is regenerated multiple times. By contrast, an agent that is saying something false ‘in good faith’ will present responses with little semantic variation when the model is sampled many times for the same context.

An agent that is being “deliberately” deceptive might also exhibit low semantic variation. But the deception is liable to be exposed if the agent is asked the same question in different contexts. Because to be effective in its deception, the agent will need to respond differently to different users, depending on what those users know.

From a safety perspective, an agent role-playing can cause just as much harm as a malicious intent human, if this agent has access to tools, bank accounts, or even social media.

Paper on arXiv: https://arxiv.org/abs/2305.16367

--

--

Gwen Cheni
Gwen Cheni

Written by Gwen Cheni

Building stealth AI+bio. Prev @KhoslaVentures @indbio @sosv🧬💻 @ucsf🌉 @jpmorgan @GoldmanSachs @yale @UChicago @LMU_Muenchen https://linktr.ee/gwencheni

No responses yet