Getting Started with Pandas: Installation, Data Setup, and First Exploration in Jupyter
Introduction
In this article we walk through the very first steps of using the pandas library for data analysis with Python. The guide covers installing pandas and Jupyter, downloading a real‑world dataset (the Stack Overflow Developer Survey), setting up a project folder, loading the CSV into a DataFrame, and performing basic inspections such as shape, info, and column display settings.
Installing pandas and Jupyter
- Create (optional) a clean virtual environment.
- Run
pip install pandasto install the library. - Install Jupyter Lab with
pip install jupyterlab. - Although Jupyter is not mandatory, it provides an interactive browser‑based interface that renders DataFrames as nicely formatted tables.
Downloading the Dataset
- The tutorial uses the 2019 Stack Overflow Developer Survey CSV, a realistic dataset with 88,000+ responses and 85 columns.
- The CSV can be downloaded from the Stack Overflow survey results page (link provided in the video description).
- After downloading, unzip the file and place the folder (renamed to
data) inside a project directory, e.g.,~/Desktop/pandas_demo.
Setting Up the Project Folder
- Create an empty folder (e.g.,
pandas_demo). - Move the unzipped
survey_results_public.csvand the accompanyingsurvey_results_schema.csvinto this folder. - Rename the main CSV to a short name like
data.csvfor easier reference.
Launching Jupyter Notebook
- Open a terminal and navigate to the project folder (
cd ~/Desktop/pandas_demo). - Start Jupyter Lab with
jupyter notebook(orjupyter lab). - In the browser, create a new Python 3 notebook and rename it to pandas_demo.
Loading Data into pandas
import pandas as pd
df = pd.read_csv('data.csv')
- The
read_csvfunction reads the CSV in a single line and returns a DataFrame, pandas' core data structure (rows × columns).
Inspecting the DataFrame
- Shape:
df.shape→ returns a tuple(rows, columns), e.g.,(88300, 85). - Info:
df.info()prints the number of entries, column names, and data types (object, int64, float64, etc.). - By default Jupyter shows only the first 20 columns; you can display all 85 columns with:
pd.set_option('display.max_columns', 85)
- To view the schema (column descriptions) load the second CSV:
schema_df = pd.read_csv('survey_results_schema.csv')
- Adjust both column and row display limits:
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)
Viewing Subsets of Data
- First rows:
df.head()(default 5) ordf.head(10)for ten rows. - Last rows:
df.tail()ordf.tail(10). These methods are handy for quick sanity checks while developing filters and analyses.
Sponsor Mention
The tutorial is sponsored by Brilliant.org, which offers interactive courses on statistics, data analysis, and machine learning. A special link provides a free trial and a discount for the first 200 users.
What Comes Next?
The next video will dive deeper into DataFrames, Series, and how to select specific rows or columns using filtering techniques. Stay tuned for hands‑on examples that build on the foundation laid here.
Final Thoughts
You now have a working pandas environment, a real dataset, and the basic commands to explore it. This setup prepares you for more advanced data‑wrangling, visualization, and statistical analysis.
With pandas installed, a Jupyter notebook ready, and a real‑world CSV loaded, you can instantly inspect the shape, data types, and sample rows of your dataset—providing a solid foundation for any further data‑analysis work.
Frequently Asked Questions
Who is Corey Schafer on YouTube?
Corey Schafer is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?
Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
What Comes Next?
The next video will dive deeper into DataFrames, Series, and how to select specific rows or columns using filtering techniques. Stay tuned for hands‑on examples that build on the foundation laid here.
Helpful resources related to this video
If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.