Prof. Mark Nelson, Spring 2025
Official catalog description:
Applications of Generative AI. Introduces generative text, image, and audio AI models. Students use models, gain an understanding of their operation, and learn how to select, integrate, and evaluate them.
Generative AI is the field that studies and builds AI systems to generate new "content" – art, text, code, music, etc. Although just one part of the larger field of AI, it has exploded in the last 2-4 years and is now probably the biggest subfield.
There are common concepts and technologies underpinning much of the current progress in generative AI: large neural networks, content embedded in a latent space, etc. There are also a number of modality-specific differences in how generators work, e.g. between text generation and image generation. Even within a modality, there are differences between different applications. Text is a modality, but text for an essay, text that is computer code, and text for a personal message are not the same!
The area is fast-moving, so the class will try to balance coverage of emerging technologies (some of which might emerge during the semester) with more durable general principles. If this all feels a bit shaky and too fast-moving, rest assured that many established researchers in the field feel the same way.
Besides the technology, generative AI naturally leads to many questions that aren't strictly technical, which we'll also discuss as my expertise and your interest permit – computer creativity, copyright law, employment and labor issues, open source models, AI regulation, etc.
There is no required textbook. I'll link a variety of resources like book chapters, tutorials, videos, blog posts, and demos under the Modules tab on Canvas.
We will cover each generative AI method at three levels:
Week 1: Introduction
Course overview, formalities, etc. Intro to generative AI as a field, and its relationship to AI more broadly, and to generative methods more broadly.
Week 2: Neural network models
Crash course on the minimum amount about neural networks you need to know for the rest of the class. Neural networks as a generic representation for any function (a "model"). Brief discussion of all the non-generative things you can do with them too (which are covered in more depth in other classes). Practical use of PyTorch and numpy.
Week 3: Embeddings
Modern generative methods deeply rely on the concept of embedding content (e.g. words, sentences, images) in a vector space where the "location" of content in space has some kind of meaning. Intuition around embeddings, how they're learned from data, and what the resulting latent space represents.
Week 4: Generating new content
Once we have content embedded in a latent space, generation consists of somehow sampling new points from that latent space (and with sequential content like text, doing it repeatedly in a coherent way). We can try things of various levels of sophistication and strategy: simple interpolation, generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, transformers. (We will briefly introduce these and return in more detail later.)
Week 5: Integrating models into a project
Practical introduction to using models in your own projects. There are generally three options, depending on the model: 1) call it through an API hosted by someone else, 2) load and use it from Python code using a library like HuggingFace Transformers, 3) load and use it from genAI software installed on your computer, which can be command-line or graphical software. (Options 2 and 3 require the model weights to be publicly available.)
Week 6: Prototype/project week
First prototype demos due. This does not have to be huge, but should try out something you will actually want to use! Will meet with everyone to discuss the prototype and what you want to build on it next.
Week 7: Large language models
Training base models with Transformers. Tokens, context windows, decoding. Perplexity metric, prompting strategies (zero-shot, few-shot), hallucinations.
Week 8: Chatbots
Turning a base LLM into something that can actively "chat" with you. Supervised fine tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO). How to evaluate how "intelligent" the result is. Leaderboards and benchmarks, and some caution around them (this is where some of the stuff you have probably seen in the news comes in, "AI aces law exam", etc.).
[spring break]
Week 9: Image generators
What's unique about images as data? Another look at diffusion models. Modes of interaction: image-to-image, text-to-image, inpainting, etc. Introduction to some of the main current models, both commercial and open source.
Week 10: Video generators
These didn't yet really work the last time I taught the class, but now do! An introduction to what's going on in video generation (which will probably have changed between when I'm posting this syllabus, and when we get to this week in the semester.)
Week 11: Music generators
What's different about music? You can think of it as a sequence of notes, somewhat like a language model. Or as audio signals, which exist in time but are also perceived in frequency space.
Week 12: Second prototype/project week
Demo your second project for me, and propose/discuss final projects.
Week 13: Putting GenAI into bigger AI systems
Retrieval augmented generation (RAG), FunSearch, LMX, agentic AI, LLM-enhanced [x].
Week 14: Final project meetings
Individual/group meetings and work sessions for your final projects.
Additional for the grad section (CSC-696):
Weekly exercises should be submitted individually, but you can consult with other people while working on them. For the final project, I recommend doing it in a group, but that is not required.
No exams! Final project presentations will be during the scheduled final exam slot: Friday, May 2, 2:30pm-5:00pm.