Understanding Pandas DataFrames and Series: A Practical Guide

 3 min read

YouTube video ID: zmdjNSmRXF4

Source: YouTube video by Corey SchaferWatch original video

PDF

Introduction

In this article we walk through the core pandas data structures—DataFrames and Series—and show how to think about them, create them, and extract information without needing to watch the original video.

What Is a DataFrame?

  • A DataFrame is a two‑dimensional table of data, essentially rows and columns, similar to a spreadsheet.
  • Each row usually represents an observation (e.g., one survey respondent) and each column represents a variable (e.g., "hobbyist", "email").
  • In Jupyter you can view the first few rows with df.head().

Visualizing a DataFrame in Jupyter

import pandas as pd
survey_df = pd.read_csv('survey_results_public.csv')
schema_df = pd.read_csv('survey_results_schema.csv')
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)
print(survey_df.head())

The output shows a tidy table with many columns (85 in the example) and thousands of rows.

Thinking About DataFrames with Native Python

  1. Dictionary for a single recordpython person = {'first': 'Cori', 'last': 'Schaefer', 'email': '[email protected]'}
  2. List of dictionaries for multiple recordspython people = { 'first': ['Cori', 'Jane', 'John'], 'last': ['Schaefer', 'Doe', 'Doe'], 'email': ['[email protected]', '[email protected]', '[email protected]'] }
  3. Keys act as column names; the lists act as column values (rows).
  4. This mental model helps bridge plain Python structures to pandas.

Creating a DataFrame from a Dictionary

import pandas as pd
people_df = pd.DataFrame(people)
print(people_df)

The result is a nicely formatted table with an automatic integer index (0, 1, 2 …). The index uniquely identifies each row.

Accessing Columns

  • Bracket notation (recommended): df['email']
  • Dot notation (convenient but risky): df.email
  • Use brackets when a column name might clash with a pandas method (e.g., a column called count).
  • Both return a Series, a one‑dimensional array that still carries an index.

Accessing Multiple Columns

df[['last', 'email']]
  • Passing a list inside the outer brackets returns a new DataFrame containing only the selected columns.

Row Selection with Indexers

IndexerWhat It UsesExample
ilocInteger locationdf.iloc[0] → first row
locLabel (index)df.loc[0] → first row (default integer labels)
- Both return a Series for a single row or a DataFrame for multiple rows.
- You can combine row and column selection: df.iloc[0, 2] (first row, third column) or df.loc[0, 'email'].

Slicing Rows and Columns

  • Rows: df.iloc[0:3] returns rows 0, 1, 2 (inclusive of start, exclusive of stop).
  • Columns: df.loc[:, 'hobbyist':'employment'] returns all columns from hobbyist through employment (inclusive).
  • Slicing with loc is inclusive on both ends, which is handy for column ranges.

Real‑World Example: Stack Overflow Survey Data

print(df.shape)               # (88000, 85)
print(df['hobbyist'].value_counts())
  • df.shape tells you the dataset size.
  • df['hobbyist'] extracts the hobbyist column as a Series.
  • .value_counts() quickly shows how many respondents answered yes vs no (≈71 k yes, 18 k no).
  • You can also retrieve a specific respondent’s answer: python df.loc[0, 'hobbyist']
  • Or grab the first three answers: python df.loc[0:2, 'hobbyist']

Summary of Key Operations

  • Create: pd.DataFrame(dict_of_lists)
  • Inspect: .head(), .shape, .columns
  • Select Column: df['col'] → Series
  • Select Multiple Columns: df[['col1', 'col2']]
  • Select Row(s): df.iloc[row_index] or df.loc[label]
  • Slice: df.iloc[start:stop], df.loc[start_label:stop_label, col_start:col_end]
  • Aggregate: Series.value_counts(), Series.mean(), etc.

What’s Next?

The next video will dive deeper into indexes—how to set a meaningful column as the index, why it matters, and how it simplifies row selection.


Feel free to experiment with the commands above on your own CSV files. Mastering DataFrames and Series is the foundation for all later pandas work such as filtering, grouping, and visualisation.

DataFrames are two‑dimensional containers of Series objects; a Series is a one‑dimensional labeled array. Pandas gives you powerful, intuitive tools to create, index, slice, and summarize tabular data, turning raw CSV files into actionable insights with just a few lines of code.

Frequently Asked Questions

Who is Corey Schafer on YouTube?

Corey Schafer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

What Is a DataFrame?

- A DataFrame is a two‑dimensional table of data, essentially rows and columns, similar to a spreadsheet. - Each row usually represents an observation (e.g., one survey respondent) and each column represents a variable (e.g., "hobbyist", "email"). - In Jupyter you can view the first few rows with `df.head()`.

What’s Next?

The next video will dive deeper into **indexes**—how to set a meaningful column as the index, why it matters, and how it simplifies row selection. --- Feel free to experiment with the commands above on your own CSV files. Mastering DataFrames and Series is the foundation for all later pandas work such as filtering, grouping, and visualisation. DataFrames are two‑dimensional containers of Series objects; a Series is a one‑dimensional labeled array. Pandas gives you powerful, intuitive tools to create, index, slice, and summarize tabular data, turning raw CSV files into actionable insights with just a few lines of code.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF