Statistics and Hypothesis Testing

About this course

Statistics is widely used as a way to extract insights from data, and it can affect how we consume information every day. However, it is often misunderstood, and as big data trends bring new concerns, it is a growingly critical part of data literacy. In this course, we will cover topics that explore the relationship between a sample and a population, and accurately infer parameters using hands-on code examples. Along the way, you’ll learn about descriptive statistics, p-values, confidence intervals, and big data concerns, all using the power of Python.

By the end of this course, you’ll understand:

Data relationship between a sample and population
How hypothesis testing allows accurate prediction of population traits
How the central limit theorem unlocks techniques for working with samples

And you'll be able to:

Use Python to effectively describe data
Analyze the uncertainty of a population parameter based on a sample
Leverage hypothesis testing to calculate p-values

This training is for you because:

You’re a data science professional wanting to understand how samples can effectively describe populations in data analysis.
You are an analyst wanting to scrutinize scientific studies more effectively
You are a project manager working with data science teams and want to ask the right questions

Prerequisites

Basic Python proficiency (if-then logic, loops, functions, variables)

Setup

To open Anaconda Notebooks:

Go to https://anaconda.cloud
Click on 'Notebooks' from the top navigation menu
Create an account or login if you already have one

Recommended preparation

Introduction to Python Programming learning path

Recommended follow-up

Probability Fundamentals on-demand course
Linear Algebra on-demand course
Introduction to Machine Learning on-demand course

Facilitator bio

Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC, he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly).

He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him on Twitter | LinkedIn | GitHub | YouTube.

Questions? Issues? Join our Community page to get help.

Curriculum02:16:15

Getting Started
How to use Anaconda Notebooks 00:01:02
Course Overview and Learning Objectives 00:02:01
Descriptive Statistics
Samples, Populations, and Parameters 00:07:24
Mean, Median, and Mode 00:11:44
Variance and Standard Deviation 00:06:18
Exercise: Describe the Provided Data 00:01:52
Normal Distribution
Probability Density Function and Cumulative Density Function 00:10:32
Inverse Cumulative Density Function 00:03:11
Standard Normal Distribution and Z-Scores 00:05:05
Exercise: Calculate the Life of a Laptop Battery 00:01:57
Central Limit Theorem and Confidence Intervals
The Central Limit Theorem 00:05:17
Critical Z Values 00:04:37
Confidence Intervals 00:06:18
T-Distribution and Smaller Samples 00:02:47
Exercise: Confidence Interval Calculation 00:03:39
Hypothesis Testing
Tea Party Experiment and P-Values 00:04:36
Two-Tailed Testing 00:12:02
One-Tailed Testing 00:03:19
Dealing with Smaller Samples 00:02:03
Exercise: Help an Online Gaming Platform 00:03:37
P-Hacking and Big Data Concerns
Texas Sharpshooter Fallacy 00:08:29
Data Mining and Simpson’s Paradox 00:10:23
P-Hacking 00:05:20
Data Bias 00:05:20
Exercise: Data Mining Gone Wild 00:04:31
Conclusion
Summary and Further Reading Resources 00:02:51
End of Course Survey

About this course