Aug 28, 2023

Python Pandas Cheat Sheet


Recently I’ve been working a lot with Python Pandas to do some financial analysis, and wanted to share a cheat sheet for anyone new to Pandas. If you’re not familiar, Pandas is an open source data analysis library that makes it easy to manipulate and mine datasets. Specifically I was using it to compare various financial transactions against a budget, but there are a wide range of possibilities. You can even connect it to Jupyter Notebooks for reports as well. The following sections are a cheat sheet of common things that you can do with Pandas. There is a ton of things you can do with Pandas, but the following are examples I found useful.

Creating a Dataframe from a Local Object

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# show output
print(df)
#                Name  Age
# 0    Luke Skywalker   28
# 1          Han Solo   32
# 2       Leia Organa   28
# 3    Obi-Wan Kenobi   65
# 4       Darth Vader   55
# 5  Lando Calrissian   42

Saving a Dataframe to a CSV File

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# create file for output
df.to_csv("output.csv",index=False)
# file should look something like this
# Name,Age
# Luke Skywalker,28
# Han Solo,32
# Leia Organa,28
# Obi-Wan Kenobi,65
# Darth Vader,55
# Lando Calrissian,42

Saving a Dataframe as an Excel Spreadsheet

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# create files for output
df.to_excel("spreadsheet.xlsx",index=False)

Reading in from a CSV

import pandas as pd

# read in file from above examples with Star Wars characters
df = pd.read_csv("output.csv")

# show output
print(df)
#                Name  Age
# 0    Luke Skywalker   28
# 1          Han Solo   32
# 2       Leia Organa   28
# 3    Obi-Wan Kenobi   65
# 4       Darth Vader   55
# 5  Lando Calrissian   42

Creating an Empty Dataframe and Adding Records

import pandas as pd

# create dataframe for output
created_df = pd.DataFrame([], columns=["name", "actual", "budget", "diff", "status"])

# create a new dataframe with values
new_df = pd.DataFrame([["lightsaber", 5, 10, 5, "equal"]], columns=["name", "actual", "budget", "diff", "status"])

# concatenate the two dataframes together
created_df = pd.concat([created_df, new_df])

# print the result
print(created_df)
# output
#                   name actual budget diff status
# 0  lightsaber purchase      5     10    5  equal

Iterating Over All Values in a Dataframe

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# update the replace json with
for index, row in df.iterrows():
    # show the value by referencing the row
    print(f'access the value at the row as an object \n {row["Name"]} is {row["Age"]}')
    # show the value by referencing the index
    print(f"access the value at the index \n {df.at[index, 'Name']} is {df.at[index, 'Age']}")

Adding a Column to a Dataframe

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# Add a new column
df['group'] = ["rebellion", "rebellion", "rebellion", "rebellion", "empire", "rebellion"]

# Display the DataFrame
print(df)
# output
#                Name  Age      group
# 0    Luke Skywalker   28  rebellion
# 1          Han Solo   32  rebellion
# 2       Leia Organa   28  rebellion
# 3    Obi-Wan Kenobi   65  rebellion
# 4       Darth Vader   55     empire
# 5  Lando Calrissian   42  rebellion

Removing Specific Columns from a Dataframe

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# show output
print(df)
#                Name  Age
# 0    Luke Skywalker   28
# 1          Han Solo   32
# 2       Leia Organa   28
# 3    Obi-Wan Kenobi   65
# 4       Darth Vader   55
# 5  Lando Calrissian   42

df = df.drop(columns="Age")

# show output
print(df)
# output
#                Name
# 0    Luke Skywalker
# 1          Han Solo
# 2       Leia Organa
# 3    Obi-Wan Kenobi
# 4       Darth Vader
# 5  Lando Calrissian

Removing Specific Rows from a Dataframe

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# show output
print(df)
#                Name  Age
# 0    Luke Skywalker   28
# 1          Han Solo   32
# 2       Leia Organa   28
# 3    Obi-Wan Kenobi   65
# 4       Darth Vader   55
# 5  Lando Calrissian   42

df = df[df["Name"].str.contains("Luke Skywalker") == False]

# show output
print(df)
#                Name  Age
# 1          Han Solo   32
# 2       Leia Organa   28
# 3    Obi-Wan Kenobi   65
# 4       Darth Vader   55
# 5  Lando Calrissian   42

Applying a Function to All Values in a Dataframe Column

import pandas as pd

# define data in a local variable
data = {'Name': ['Luke Skywalker', 'Han Solo', 'Leia Organa', "Obi-Wan Kenobi", "Darth Vader", "Lando Calrissian"],
        'Age': [28, 32, 28, 65, 55, 42]}

# read in local data to dataframe
df = pd.DataFrame(data)

# show output before applying value
print("values original")
print(df)

# increase the age by 1
df['Age'] = df['Age'].apply(lambda x: x + 1)

# show output after applying value
print("values incremented")
print(df)

Conclusion

I hope this cheat sheet helps you as a reference. There are a lot more things you can do with Python Pandas, but these should get you started.