This 3-credit course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian inference, expectation maximization, the Viterbi algorithm, the Inside-Outside algorithm for probabilistic context-free grammars, and higher-order language models. This course complements the introductory course in symbolic and analytic computational approaches to language, LING/CSC/PSY 538 Computational Linguistics.

In this course, we will cover machine learning basics and text classification algorithms, such as naive Bayes and logistic regression. We will also explore a range of important natural language processing (NLP) topics, such as word representations (up to static embeddings), sequence labeling (part of speech tagging, shallow parsing/chunking, etc.), and structured prediction (chart-based parsing, transition-based dependency parsing). Students will practice applying new concepts (including programming) first in low-stakes OpenClass.ai practice sets, and will then demonstrate their ability to use these concepts in a series of programming assignments. Students enrolled in graduate-level sections will also participate in a private class competition on Kaggle, where they can apply an approach of their choice to an open-ended classification task.

Most recent syllabus

All past syllabi for this course

Spring 2024 (in person)
Spring 2023 (online)