How to prepare for the MS in Human Language Technology program

Many students who are interested in the M.S. in Human Language Technology program at the University of Arizona have a strong interest in language, and many have academic backgrounds that include some kind of formal study of language—but might not include formal coursework in computer science or programming. In order to start our program, however, students are required to have at least a basic proficiency in programming, preferably in Python. Many students who successfully complete our program are able to reach this required level solely through self-study, using the many high-quality, free resources that are available online. This is the advice that I typically give about this kind of preparation.

If you still need to get a foundation in basic Python, there are a number of free options available. One of our faculty has written a book, “Python for Linguists,” which you can use as a self-study reference. You can find an online, self-paced Introduction to Python on OpenClass.ai. Alternatively, you can find a more formal (but still free and somewhat self-paced) alternative in Harvard/edX’s Introduction to Programming with Python. (The companion course, CS50X Introduction to Computer Science, provides a more general introduction covering Python as well as C, HTML, Javascript, and SQL.) Once you have a basic grasp of Python, you could even pursue a slightly more advanced foundation with something like UCSD/edX’s Python for Data Science. As well as introducing important Python libraries like NumPy, Pandas, and Scikit-Learn, this course introduces tools like Jupyter Notebooks that are not technically restricted to Python but are often used in NLP and data science.

If, on the other hand, you’re mainly hoping to get better prepared for other courses in NLP, you should know that the textbook for many of our courses, Speech and Language Processing, has a draft version that is available online for free at https://web.stanford.edu/~jurafsky/slp3/ . You’re not at all required to read or understand any of this prior to starting our courses, but if you have time, you could work through at least the first few chapters of this textbook to get a better understanding of foundational NLP concepts that we’ll go through in the courses in the first semester of the program.

Whatever you end up doing in order to become more comfortable with programming, I encourage you to actually just program, trying to write programs that solve problems you currently face. Actually trying to use Python to solve a problem you’re familiar with will help you become more comfortable with Python. I’d also encourage you to create a free account on GitHub.com and begin to use git and GitHub to manage versions and improvements to your code. You can find guidelines for best practices with git and GitHub here.

Once you start writing in Python, I also encourage you to read through a lot of clean, model code. In teaching our two introductory online courses, LING 529 and LING 531, I see many students writing code that will pass the automated tests I provide—so, the code minimally does what it’s supposed to do—but it’s written in a way that is unclear to read, hard to maintain, or requires more processing work than is really needed. For Python specifically, a good source of tips for writing clean code and using Python the best way it can be used is the website realpython.com. The principles of “Clean Coding” can be found on Uncle Bob’s Clean Coder website and in his books, though they’re not specific to Python.

For many of the courses in our program, we recommend setting up your development environment in Linux (such as the latest Long-Term Stable version of Ubuntu) or a Linux-like operating system (including macOS). If you’re accustomed to working with a Windows system, you can create a Linux environment in several ways, including creating a virtual machine with VirtualBox or a similar tool, establishing an Ubuntu cloud desktop with a provider like Amazon Web Services, or adding Ubuntu to your computer in a single-boot or dual-boot configuration. Your first HLT course will help you with this process, but becoming familiar with Ubuntu and how to use it prior to the course will help you get up to speed more quickly when that time comes. If you plan to use an Ubuntu virtual machine for your development environment, we recommend that you have a computer with at least 8 Gb of RAM.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Who and What comprise AI Skepticism? | Eric Jackson
  • Vector Databases Are the Wrong Abstraction | Eric Jackson
  • How language data can benefit your organization | Eric Jackson posted on the topic | LinkedIn
  • I’ve been planning to introduce Retrieval-Augmented Generation (RAG) in my LING 531 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑎𝑙 course this fall. I was looking for a very compact introduction to RAG… | Eric Jackson
  • Language Technologies FOR ALL