This video is still being processed. Please check back later and refresh the page.

Uh oh! Something went wrong, please try again.

Statistics and Hypothesis Testing

Learn fundamental skills to describe and analyze data with Python.

rate limit

Code not recognized.

About this course

Statistics is widely used as a way to extract insights from data, and it can affect how we consume information every day. However, it is often misunderstood, and as big data trends bring new concerns, it is a growingly critical part of data literacy. In this course, we will cover topics that explore the relationship between a sample and a population, and accurately infer parameters using hands-on code examples. Along the way, you’ll learn about descriptive statistics, p-values, confidence intervals, and big data concerns, all using the power of Python.

By the end of this course, you’ll understand:

  • Data relationship between a sample and population
  • How hypothesis testing allows accurate prediction of population traits
  • How the central limit theorem unlocks techniques for working with samples

And you'll be able to:

  • Use Python to effectively describe data
  • Analyze the uncertainty of a population parameter based on a sample
  • Leverage hypothesis testing to calculate p-values

This training is for you because:

  • You’re a data science professional wanting to understand how samples can effectively describe populations in data analysis.
  • You are an analyst wanting to scrutinize scientific studies more effectively 
  • You are a project manager working with data science teams and want to ask the right questions

Prerequisites

  • Basic Python proficiency (if-then logic, loops, functions, variables)

Setup

To open Anaconda Notebooks:

  1. Go to https://anaconda.cloud
  2. Click on 'Notebooks' from the top navigation menu
  3. Create an account or login if you already have one

Recommended preparation  

Recommended follow-up

Facilitator bio

Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC, he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly).

He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him on Twitter | LinkedIn | GitHub | YouTube.

Questions? Issues? Join our Community page to get help.

Curriculum02:16:15

  • Getting Started
  • How to use Anaconda Notebooks 00:01:02
  • Course Overview and Learning Objectives 00:02:01
  • Descriptive Statistics
  • Samples, Populations, and Parameters 00:07:24
  • Mean, Median, and Mode 00:11:44
  • Variance and Standard Deviation 00:06:18
  • Exercise: Describe the Provided Data 00:01:52
  • Normal Distribution
  • Probability Density Function and Cumulative Density Function 00:10:32
  • Inverse Cumulative Density Function 00:03:11
  • Standard Normal Distribution and Z-Scores 00:05:05
  • Exercise: Calculate the Life of a Laptop Battery 00:01:57
  • Central Limit Theorem and Confidence Intervals
  • The Central Limit Theorem 00:05:17
  • Critical Z Values 00:04:37
  • Confidence Intervals 00:06:18
  • T-Distribution and Smaller Samples 00:02:47
  • Exercise: Confidence Interval Calculation 00:03:39
  • Hypothesis Testing
  • Tea Party Experiment and P-Values 00:04:36
  • Two-Tailed Testing 00:12:02
  • One-Tailed Testing 00:03:19
  • Dealing with Smaller Samples 00:02:03
  • Exercise: Help an Online Gaming Platform 00:03:37
  • P-Hacking and Big Data Concerns
  • Texas Sharpshooter Fallacy 00:08:29
  • Data Mining and Simpson’s Paradox 00:10:23
  • P-Hacking 00:05:20
  • Data Bias 00:05:20
  • Exercise: Data Mining Gone Wild 00:04:31
  • Conclusion
  • Summary and Further Reading Resources 00:02:51
  • End of Course Survey

About this course

Statistics is widely used as a way to extract insights from data, and it can affect how we consume information every day. However, it is often misunderstood, and as big data trends bring new concerns, it is a growingly critical part of data literacy. In this course, we will cover topics that explore the relationship between a sample and a population, and accurately infer parameters using hands-on code examples. Along the way, you’ll learn about descriptive statistics, p-values, confidence intervals, and big data concerns, all using the power of Python.

By the end of this course, you’ll understand:

  • Data relationship between a sample and population
  • How hypothesis testing allows accurate prediction of population traits
  • How the central limit theorem unlocks techniques for working with samples

And you'll be able to:

  • Use Python to effectively describe data
  • Analyze the uncertainty of a population parameter based on a sample
  • Leverage hypothesis testing to calculate p-values

This training is for you because:

  • You’re a data science professional wanting to understand how samples can effectively describe populations in data analysis.
  • You are an analyst wanting to scrutinize scientific studies more effectively 
  • You are a project manager working with data science teams and want to ask the right questions

Prerequisites

  • Basic Python proficiency (if-then logic, loops, functions, variables)

Setup

To open Anaconda Notebooks:

  1. Go to https://anaconda.cloud
  2. Click on 'Notebooks' from the top navigation menu
  3. Create an account or login if you already have one

Recommended preparation  

Recommended follow-up

Facilitator bio

Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC, he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly).

He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him on Twitter | LinkedIn | GitHub | YouTube.

Questions? Issues? Join our Community page to get help.

Curriculum02:16:15

  • Getting Started
  • How to use Anaconda Notebooks 00:01:02
  • Course Overview and Learning Objectives 00:02:01
  • Descriptive Statistics
  • Samples, Populations, and Parameters 00:07:24
  • Mean, Median, and Mode 00:11:44
  • Variance and Standard Deviation 00:06:18
  • Exercise: Describe the Provided Data 00:01:52
  • Normal Distribution
  • Probability Density Function and Cumulative Density Function 00:10:32
  • Inverse Cumulative Density Function 00:03:11
  • Standard Normal Distribution and Z-Scores 00:05:05
  • Exercise: Calculate the Life of a Laptop Battery 00:01:57
  • Central Limit Theorem and Confidence Intervals
  • The Central Limit Theorem 00:05:17
  • Critical Z Values 00:04:37
  • Confidence Intervals 00:06:18
  • T-Distribution and Smaller Samples 00:02:47
  • Exercise: Confidence Interval Calculation 00:03:39
  • Hypothesis Testing
  • Tea Party Experiment and P-Values 00:04:36
  • Two-Tailed Testing 00:12:02
  • One-Tailed Testing 00:03:19
  • Dealing with Smaller Samples 00:02:03
  • Exercise: Help an Online Gaming Platform 00:03:37
  • P-Hacking and Big Data Concerns
  • Texas Sharpshooter Fallacy 00:08:29
  • Data Mining and Simpson’s Paradox 00:10:23
  • P-Hacking 00:05:20
  • Data Bias 00:05:20
  • Exercise: Data Mining Gone Wild 00:04:31
  • Conclusion
  • Summary and Further Reading Resources 00:02:51
  • End of Course Survey