This video is still being processed. Please check back later and refresh the page.

Uh oh! Something went wrong, please try again.

Regular Expressions in Python

Powerful pattern matching for text data.

rate limit

Code not recognized.

About this course

Regular expressions (aka “RegEx”) are a powerful way to match patterns in text data, giving you the ability to compare, find, replace, and split text data. While intimidating at first glance, they have a surprisingly short learning curve and yield powerful results. Their applicability spans diverse domains, from natural language processing (NLP) and pattern matching in documents to tokenization for large language models (LLMs), making them invaluable in day-to-day tasks involving text. Many Python libraries, including pandas and Polars, support regular expressions. Beyond Python, they find utility in numerous text editors (e.g., Notepad++, Visual Studio Code, Jupyter, PyCharm), other programming languages (e.g., Java, Go), and data querying (e.g., SQL, NoSQL). 

This interactive course will teach you regular expressions in Python. You’ll start with basic character ranges and quantifiers, progressing to more advanced operators, such as groups and wildcards. You will also learn how to take this knowledge to search and process documents, harnessing the power of regular expressions in pandas, SQL, and NLP libraries.

What you'll learn—and how you can apply it

By the end of this hands-on course, you’ll understand:

  • What regular expressions are and how they help with text processing.
  • Different operators that regular expressions support, from wildcards to quantifiers.
  • How regular expressions apply to hundreds of platforms within and beyond Python.

And you’ll be able to:

  • Perform text matching tasks that previously seemed infeasible, such as identifying IP addresses or proprietary product names in a document.
  • Split, parse, wrangle, and match text based on specific patterns.
  • Perform the most tedious text cleaning tasks elegantly with minimal code.

This training is for you because…

  • You are a data science professional who works with text data.
  • You are looking for ways to search, replace, and parse text with more automation and minimal development. 
  • You are a Python programmer needing to match and validate text data.

Prerequisites

  • Basic proficiency with Python (particularly variables, functions, and strings) as well as library usage (pip/conda install, import keyword).
  • Proficiency with Jupyter Notebooks.

Setup

To open Anaconda Notebooks:

  1. Go to https://anaconda.cloud
  2. Click on 'Notebooks' from the top navigation menu
  3. Create an account or login if you already have one

Recommended preparation

Recommended follow-up

Facilitator bio

Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC, he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly).

He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him on Twitter | LinkedIn | GitHub | YouTube.

Questions? Issues? Contact [email protected].

Curriculum02:24:00

  • Getting Started
  • How to use Anaconda Notebooks 00:01:02
  • Course overview and learning objectives 00:01:23
  • Regular Expressions (RegEx) Overview
  • What are regular expressions? 00:04:45
  • Matching text with regular expressions example 00:05:46
  • Use cases 00:07:49
  • Exercise: Matching text 00:01:38
  • Matching Literals and Character Ranges
  • Literals 00:04:00
  • Metacharacters 00:04:00
  • Character ranges 00:10:19
  • Character ranges continued 00:03:18
  • Digit, word, and whitespace characters 00:05:47
  • Exercise: Matching pattern 00:01:44
  • Matching Start and End of Lines and Strings
  • Full matches versus partial matches 00:06:10
  • Start of string and line 00:02:45
  • End of string and line 00:01:52
  • Forcing full matches 00:03:54
  • Exercise: Lines 00:02:04
  • Repeating Patterns with Quantifiers
  • Fixed quantifiers 00:03:08
  • Min/max quantifiers 00:05:58
  • Shorthand quantifiers 00:03:21
  • Greedy versus lazy quantifiers 00:04:30
  • Exercise: Quantifiers 00:02:07
  • Matching Anything with Wildcards
  • Wildcards 00:05:32
  • Combining wildcards with quantifiers 00:07:47
  • Exercise: Wildcards with quantifiers 00:01:39
  • Grouping and Alternating Patterns
  • Grouping 00:04:38
  • Alternators 00:01:20
  • Prefix and suffix 00:04:10
  • Exercise: Grouping 00:01:10
  • Searching, Splitting, and Replacing Text
  • Compiling a RegEx 00:02:42
  • Scanning a document 00:02:43
  • Splitting data 00:03:37
  • Replacing data 00:04:42
  • Exercise: Splitting text 00:01:16
  • RegEx in pandas, SQL, and NLP
  • pandas 00:05:38
  • SQL 00:03:14
  • Natural Language Processing (NLP) 00:02:53
  • Exercise: Filtering records 00:01:53
  • Conclusion
  • Summary 00:01:46
  • End of course survey
  • Course Completion

About this course

Regular expressions (aka “RegEx”) are a powerful way to match patterns in text data, giving you the ability to compare, find, replace, and split text data. While intimidating at first glance, they have a surprisingly short learning curve and yield powerful results. Their applicability spans diverse domains, from natural language processing (NLP) and pattern matching in documents to tokenization for large language models (LLMs), making them invaluable in day-to-day tasks involving text. Many Python libraries, including pandas and Polars, support regular expressions. Beyond Python, they find utility in numerous text editors (e.g., Notepad++, Visual Studio Code, Jupyter, PyCharm), other programming languages (e.g., Java, Go), and data querying (e.g., SQL, NoSQL). 

This interactive course will teach you regular expressions in Python. You’ll start with basic character ranges and quantifiers, progressing to more advanced operators, such as groups and wildcards. You will also learn how to take this knowledge to search and process documents, harnessing the power of regular expressions in pandas, SQL, and NLP libraries.

What you'll learn—and how you can apply it

By the end of this hands-on course, you’ll understand:

  • What regular expressions are and how they help with text processing.
  • Different operators that regular expressions support, from wildcards to quantifiers.
  • How regular expressions apply to hundreds of platforms within and beyond Python.

And you’ll be able to:

  • Perform text matching tasks that previously seemed infeasible, such as identifying IP addresses or proprietary product names in a document.
  • Split, parse, wrangle, and match text based on specific patterns.
  • Perform the most tedious text cleaning tasks elegantly with minimal code.

This training is for you because…

  • You are a data science professional who works with text data.
  • You are looking for ways to search, replace, and parse text with more automation and minimal development. 
  • You are a Python programmer needing to match and validate text data.

Prerequisites

  • Basic proficiency with Python (particularly variables, functions, and strings) as well as library usage (pip/conda install, import keyword).
  • Proficiency with Jupyter Notebooks.

Setup

To open Anaconda Notebooks:

  1. Go to https://anaconda.cloud
  2. Click on 'Notebooks' from the top navigation menu
  3. Create an account or login if you already have one

Recommended preparation

Recommended follow-up

Facilitator bio

Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC, he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly).

He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him on Twitter | LinkedIn | GitHub | YouTube.

Questions? Issues? Contact [email protected].

Curriculum02:24:00

  • Getting Started
  • How to use Anaconda Notebooks 00:01:02
  • Course overview and learning objectives 00:01:23
  • Regular Expressions (RegEx) Overview
  • What are regular expressions? 00:04:45
  • Matching text with regular expressions example 00:05:46
  • Use cases 00:07:49
  • Exercise: Matching text 00:01:38
  • Matching Literals and Character Ranges
  • Literals 00:04:00
  • Metacharacters 00:04:00
  • Character ranges 00:10:19
  • Character ranges continued 00:03:18
  • Digit, word, and whitespace characters 00:05:47
  • Exercise: Matching pattern 00:01:44
  • Matching Start and End of Lines and Strings
  • Full matches versus partial matches 00:06:10
  • Start of string and line 00:02:45
  • End of string and line 00:01:52
  • Forcing full matches 00:03:54
  • Exercise: Lines 00:02:04
  • Repeating Patterns with Quantifiers
  • Fixed quantifiers 00:03:08
  • Min/max quantifiers 00:05:58
  • Shorthand quantifiers 00:03:21
  • Greedy versus lazy quantifiers 00:04:30
  • Exercise: Quantifiers 00:02:07
  • Matching Anything with Wildcards
  • Wildcards 00:05:32
  • Combining wildcards with quantifiers 00:07:47
  • Exercise: Wildcards with quantifiers 00:01:39
  • Grouping and Alternating Patterns
  • Grouping 00:04:38
  • Alternators 00:01:20
  • Prefix and suffix 00:04:10
  • Exercise: Grouping 00:01:10
  • Searching, Splitting, and Replacing Text
  • Compiling a RegEx 00:02:42
  • Scanning a document 00:02:43
  • Splitting data 00:03:37
  • Replacing data 00:04:42
  • Exercise: Splitting text 00:01:16
  • RegEx in pandas, SQL, and NLP
  • pandas 00:05:38
  • SQL 00:03:14
  • Natural Language Processing (NLP) 00:02:53
  • Exercise: Filtering records 00:01:53
  • Conclusion
  • Summary 00:01:46
  • End of course survey
  • Course Completion