Robust LLM-based AI

Like many AI researchers, I've been intrigued by recent advances in large language models (LLMs), and generative AI more broadly. At the same time – and also in common with many researchers – I'm concerned about their reliability, robustness, and ease of control. This cluster of projects looks into what we can do about that.

The strategy I've been pursuing, with number of collaborators, is to put the LLM inside a larger AI system. Something I picked up from games that I use in thinking about AI is: what's the "core AI loop"? In reinforcement learning it's something like act-observe-update. In chatbot-style AI it's prompt-generate-prompt-generate.

What's a good core AI loop? In my opinion, a lot of "agentic AI" work as of 2025 is too ad-hoc on that question, essentially wrapping an LLM in a hand-crafted loop. I think we can do better by starting with a classical AI algorithm as the core AI loop, then looking for the brittle parts that can be LLM-ified.

Separately, I've grown concerned about reproducibility of published results that use generative AI systems, especially those that use closed-weight, gated models like ChatGPT, so have been doing some work on that too.

Publications:

Language model crossover: Variation through few-shot prompting, ACM TELO 2024.
Prototyping Slice of Life: Social physics with symbolically grounded LLM-based generative dialogue, FDG 2024
Prompt Wrangling: On replication and generalization in large language models for PCG levels, PCG Workshop 2024
Towards mixed-initiative generation of multi-channel sequential structure, ICLR 2018 Workshop

Funding provided by:

The National Science Foundation, under the Robust Intelligence program of the Division of Information & Intelligent Systems (IIS), Grant No. 1948017
American University, through a College of Arts & Sciences faculty startup grant

Collaborators (current): Adam Gaier, Amy K. Hoover, Ioannis Koutis, Joel Lehman, Elliot Meyerson, Arash Moradi Karkaj, Ben Samuel, Mike Treanor

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.