CSC-496/696: Applications of Generative AI

Prof. Mark Nelson, Spring 2025

Overview

Official catalog description:

Applications of Generative AI. Introduces generative text, image, and audio AI models. Students use models, gain an understanding of their operation, and learn how to select, integrate, and evaluate them.

Generative AI is the field that studies and builds AI systems to generate new "content" – art, text, code, music, etc. Although just one part of the larger field of AI, it has exploded in the last 2-4 years and is now probably the biggest subfield.

There are common concepts and technologies underpinning much of the current progress in generative AI: large neural networks, content embedded in a latent space, etc. There are also a number of modality-specific differences in how generators work, e.g. between text generation and image generation. Even within a modality, there are differences between different applications. Text is a modality, but text for an essay, text that is computer code, and text for a personal message are not the same!

The area is fast-moving, so the class will try to balance coverage of emerging technologies (some of which might emerge during the semester) with more durable general principles. If this all feels a bit shaky and too fast-moving, rest assured that many established researchers in the field feel the same way.

Besides the technology, generative AI naturally leads to many questions that aren't strictly technical, which we'll also discuss as my expertise and your interest permit – computer creativity, copyright law, employment and labor issues, open source models, AI regulation, etc.

Practicalities

There is no required textbook. I'll link a variety of resources like book chapters, tutorials, videos, blog posts, and demos under the Modules tab on Canvas.

We will cover each generative AI method at three levels:

How it works, in a conceptual/theory sense. I.e. what system architecture does the system use, where does the data come from, how is it trained, and how are new outputs generated?
How it works, at a code level. Mostly using Python to load and interact with models.
Applications, which range from research demos to polished products. Some of these are GUIs, and some require small amounts of code (usually Python or Javascript) to call an API.

Since there are no exams, it is a bit "choose your own adventure" how much depth you want to go into on some of these, especially the first bullet point. I will introduce some math, algorithms, and theory, because I don't want to present systems as mysterious black boxes, but you will not be tested on it. You will need to get some hands-on experience at the code and application level though, for the projects.

Schedule

Week 1: Introduction

Course overview, formalities, etc. Intro to generative AI as a field, and its relationship to AI more broadly, and to generative methods more broadly.

Part I: Foundations

Week 2: Neural network models

Crash course on the minimum amount about neural networks you need to know for the rest of the class. Neural networks as a generic representation for any function (a "model"). Brief discussion of all the non-generative things you can do with them too (which are covered in more depth in other classes). Practical use of PyTorch and numpy.

Week 3: Embeddings

Modern generative methods deeply rely on the concept of embedding content (e.g. words, sentences, images) in a vector space where the "location" of content in space has some kind of meaning. Intuition around embeddings, how they're learned from data, and what the resulting latent space represents.

Week 4: Generating new content

Once we have content embedded in a latent space, generation consists of somehow sampling new points from that latent space (and with sequential content like text, doing it repeatedly in a coherent way). We can try things of various levels of sophistication and strategy: simple interpolation, generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, transformers. (We will briefly introduce these and return in more detail later.)

Part II: Hands-on

Week 5: Integrating models into a project

Practical introduction to using models in your own projects. There are generally three options, depending on the model: 1) call it through an API hosted by someone else, 2) load and use it from Python code using a library like HuggingFace Transformers, 3) load and use it from genAI software installed on your computer, which can be command-line or graphical software. (Options 2 and 3 require the model weights to be publicly available.)

Week 6: Prototype/project week

First prototype demos due. This does not have to be huge, but should try out something you will actually want to use! Will meet with everyone in class to discuss the prototype and what you want to build on it next.

Part III: Deeper modality dives

Week 7: Large language models

Training base models with Transformers. Tokens, context windows, decoding. Perplexity metric, prompting strategies (zero-shot, few-shot), hallucinations.

Week 8: Chatbots

Turning a base LLM into something that can actively "chat" with you. Supervised fine tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO). How to evaluate how "intelligent" the result is. Leaderboards and benchmarks, and some caution around them (this is where some of the stuff you have probably seen in the news comes in, "AI aces law exam", etc.).

[spring break]

Week 9: Image generators

What's unique about images as data? Another look at diffusion models. Modes of interaction: image-to-image, text-to-image, inpainting, etc. Introduction to some of the main current models, both commercial and open source.

Week 10: Video generators

These didn't yet really work the last time I taught the class, but now do! An introduction to what's going on in video generation (which will probably have changed between when I'm posting this syllabus, and when we get to this week in the semester.)

Week 11: Music generators

What's different about music? You can think of it as a sequence of notes, somewhat like a language model. Or as audio signals, which exist in time but are also perceived in frequency space.

Part IV: Second project

Week 12: Second prototype/project week

Demo your second project for me in class, and propose/discuss final projects.

Part V: Current research and final projects

Week 13: Putting GenAI into bigger AI systems

Retrieval augmented generation (RAG), FunSearch, LMX, agentic AI, LLM-enhanced [x].

Week 14: Final project meetings

Individual/group meetings and work sessions for your final projects.

Grading

Weekly exercises: 40%
Two intermediate projects: 30% (15% each)
Final project: 30%

Additional for the grad section (CSC-696):

Students in 696 will, once during the semester, give a short presentation about a research paper or research system (e.g. a newly released model or genAI tool) to the class.
Students in 696 should have a research-oriented component to the final project (will discuss this individually).

Weekly exercises should be submitted individually, but you can consult with other people while working on them. These are normally due the week after assigned. I will grant extensions on request, but within reason (e.g. not for every exercise, and you can't turn them all in at the end of the semester). For the final project, I recommend doing it in a group, but that is not required.

No exams! Final project presentations will be in class during the scheduled final exam slot: Friday, May 2, 2:30pm-5:00pm.

Credit: This special-topics course was co-developed with Prof. Amy Hoover at the New Jersey Institute of Technology, who is teaching a similar special-topics class this semester.