Data Preparation for Large Language Models
Transforming and cleaning data for LLMs.
Transforming and cleaning data for LLMs.
What you'll learn—and how you can apply it
By the end of this hands-on course, you’ll understand:
Significance of curating and processing textual data for LLMs., How to clean and prepare textual data.,How different vectorization models apply to different language model problems.
And you’ll be able to:
Description
Large language models (LLMs) have come out of left field and surprised everyone in recent years. From ChatGPT to Google Bard, it is hard to ignore the advances of machine learning to produce human-like text based on a large corpus of text documents.
However, textual data can come from diverse sources, including books, online articles, social media, or internal documents. Natural language is messy and not readily understood by LLMs. In this hands-on course, you'll learn fundamental techniques for cleaning and vectorizing text data so it can be used by LLMs. We will cover many code examples using Python and scikit-learn, and work our way from bag-of-words models to word embeddings.
This training is for you because...
Prerequisites
Thomas is the Founder of Nield Consulting Group and Yawman Flight, and an instructor at University of Southern California. He has authored bestselling books, including Essential Math for Data Science (O’Reilly).