Getting Started with Pandas: Installation, Data Setup, and First Exploration in Jupyter

 3 min read

YouTube video ID: ZyhVh-qRZPA

Source: YouTube video by Corey SchaferWatch original video

PDF

Introduction

In this article we walk through the very first steps of using the pandas library for data analysis with Python. The guide covers installing pandas and Jupyter, downloading a real‑world dataset (the Stack Overflow Developer Survey), setting up a project folder, loading the CSV into a DataFrame, and performing basic inspections such as shape, info, and column display settings.

Installing pandas and Jupyter

  • Create (optional) a clean virtual environment.
  • Run pip install pandas to install the library.
  • Install Jupyter Lab with pip install jupyterlab.
  • Although Jupyter is not mandatory, it provides an interactive browser‑based interface that renders DataFrames as nicely formatted tables.

Downloading the Dataset

  • The tutorial uses the 2019 Stack Overflow Developer Survey CSV, a realistic dataset with 88,000+ responses and 85 columns.
  • The CSV can be downloaded from the Stack Overflow survey results page (link provided in the video description).
  • After downloading, unzip the file and place the folder (renamed to data) inside a project directory, e.g., ~/Desktop/pandas_demo.

Setting Up the Project Folder

  • Create an empty folder (e.g., pandas_demo).
  • Move the unzipped survey_results_public.csv and the accompanying survey_results_schema.csv into this folder.
  • Rename the main CSV to a short name like data.csv for easier reference.

Launching Jupyter Notebook

  1. Open a terminal and navigate to the project folder (cd ~/Desktop/pandas_demo).
  2. Start Jupyter Lab with jupyter notebook (or jupyter lab).
  3. In the browser, create a new Python 3 notebook and rename it to pandas_demo.

Loading Data into pandas

import pandas as pd
df = pd.read_csv('data.csv')
  • The read_csv function reads the CSV in a single line and returns a DataFrame, pandas' core data structure (rows × columns).

Inspecting the DataFrame

  • Shape: df.shape → returns a tuple (rows, columns), e.g., (88300, 85).
  • Info: df.info() prints the number of entries, column names, and data types (object, int64, float64, etc.).
  • By default Jupyter shows only the first 20 columns; you can display all 85 columns with:
pd.set_option('display.max_columns', 85)
  • To view the schema (column descriptions) load the second CSV:
schema_df = pd.read_csv('survey_results_schema.csv')
  • Adjust both column and row display limits:
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)

Viewing Subsets of Data

  • First rows: df.head() (default 5) or df.head(10) for ten rows.
  • Last rows: df.tail() or df.tail(10). These methods are handy for quick sanity checks while developing filters and analyses.

Sponsor Mention

The tutorial is sponsored by Brilliant.org, which offers interactive courses on statistics, data analysis, and machine learning. A special link provides a free trial and a discount for the first 200 users.

What Comes Next?

The next video will dive deeper into DataFrames, Series, and how to select specific rows or columns using filtering techniques. Stay tuned for hands‑on examples that build on the foundation laid here.

Final Thoughts

You now have a working pandas environment, a real dataset, and the basic commands to explore it. This setup prepares you for more advanced data‑wrangling, visualization, and statistical analysis.

With pandas installed, a Jupyter notebook ready, and a real‑world CSV loaded, you can instantly inspect the shape, data types, and sample rows of your dataset—providing a solid foundation for any further data‑analysis work.

Frequently Asked Questions

Who is Corey Schafer on YouTube?

Corey Schafer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

What Comes Next?

The next video will dive deeper into DataFrames, Series, and how to select specific rows or columns using filtering techniques. Stay tuned for hands‑on examples that build on the foundation laid here.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF