Filtering Data in Pandas: A Complete Guide

 2 min read

YouTube video ID: Lw2rlcxScZY

Source: YouTube video by Corey SchaferWatch original video

PDF

Introduction

In this article we walk through the essential techniques for filtering rows and columns in pandas DataFrame and Series objects. Whether you need to isolate respondents who know Python, select a specific salary range, or limit results to certain countries, the methods described here cover the most common scenarios.

Understanding Boolean Masks

  • A comparison such as df['last_name'] == 'Doe' returns a Series of True/False values.
  • This series acts as a mask: True marks rows that satisfy the condition, False marks those that do not.
  • Example mask: 0 False 1 True 2 True dtype: bool

Applying a Filter Directly

mask = df['last_name'] == 'Doe'
filtered_df = df[mask]

The result is a new DataFrame containing only the rows where the mask is True.

Using the .loc Indexer

  • .loc accepts a boolean mask for the row selector and a list of column labels for the column selector.
  • Syntax: df.loc[mask, ['email']] returns the email column for rows that match the mask.
  • Benefits: you can filter rows and pick specific columns in a single, readable statement.

Combining Conditions

  • AND: use & and wrap each condition in parentheses. python mask = (df['last_name'] == 'Doe') & (df['first_name'] == 'John')
  • OR: use |. python mask = (df['last_name'] == 'Schaefer') | (df['first_name'] == 'John')
  • NOT: prepend ~ to a mask to invert it. python mask = ~((df['last_name'] == 'Schaefer') & (df['first_name'] == 'John'))

Real‑World Example: Survey Data

1. Filtering by Salary

high_salary = df['ConvertedComp'] > 70000
result = df.loc[high_salary, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Shows respondents earning more than $70k together with their country and known languages.

2. Filtering by a List of Countries

countries = ['United States', 'India', 'United Kingdom', 'Germany', 'Canada']
mask = df['Country'].isin(countries)
result = df.loc[mask, 'Country']

Returns only rows whose Country value appears in the predefined list.

3. Filtering with String Methods

When a column stores multiple values as a semicolon‑separated string (e.g., LanguageWorkedWith), use the str.contains method:

mask = df['LanguageWorkedWith'].str.contains('Python', na=False)
result = df.loc[mask, 'LanguageWorkedWith']

Selects all respondents who listed Python among their known languages.

Key Takeaways

  • Boolean masks are the foundation of pandas filtering.
  • .loc provides a clean way to apply masks and select columns simultaneously.
  • Combine conditions with &, |, and ~ for complex queries.
  • Use Series.isin() for membership tests and Series.str.contains() for substring searches.
  • Filtering is usually the first step in any pandas workflow, allowing you to work only with the data that matters.

Filtering with boolean masks and the .loc indexer is a fundamental pandas skill that lets you quickly isolate the exact rows and columns you need, forming the basis for all subsequent data analysis tasks.

Frequently Asked Questions

Who is Corey Schafer on YouTube?

Corey Schafer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF