Pandas¶
A personal diary of DataFrame munging over the years.
Data Types and Conversion¶
Convert Series datatype to numeric (will error if column has non-numeric values)
1 | |
Convert Series datatype to numeric, getting rid of any non-numeric values
1 | |
Convert Series datatype to numeric, changing non-numeric values to NaN
1 | |
Convert Django queryset to DataFrame
1 2 3 | |
Change data type of DataFrame column
1 | |
Exploring and Finding Data¶
Get a report of all duplicate records in a DataFrame, based on specific columns
1 2 | |
List unique values in a DataFrame column
1 | |
For each unique value in a DataFrame column, get a frequency count
1 | |
Grab DataFrame rows where column = a specific value
1 | |
Grab DataFrame rows where column value is present in a list
1 2 3 4 | |
Grab DataFrame rows where column value is not present in a list
1 2 3 4 | |
Select from DataFrame using criteria from multiple columns
(use | instead of & to do an OR)
1 | |
Loop through rows in a DataFrame
(if you must)
1 2 | |
Much faster way to loop through DataFrame rows if you can work with tuples
1 2 | |
Get top n for each group of columns in a sorted DataFrame
(make sure DataFrame is sorted first)
1 2 3 | |
Grab DataFrame rows where specific column is null/notnull
1 | |
Select from DataFrame using multiple keys of a hierarchical index
1 2 3 | |
Slice values in a DataFrame column (aka Series)
1 | |
Get quick count of rows in a DataFrame
1 | |
Get length of data in a DataFrame column
1 | |
Updating and Cleaning Data¶
Delete column from DataFrame
1 | |
Rename several DataFrame columns
1 2 3 4 5 | |
Lower-case all DataFrame column names
1 | |
Even more fancy DataFrame column re-naming
lower-case all DataFrame column names (for example)
1 | |
Lower-case everything in a DataFrame column
1 | |
Sort DataFrame by multiple columns
1 2 | |
Change all NaNs to None (useful before loading to a db)
1 | |
More pre-db insert cleanup…make a pass through the DataFrame, stripping whitespace from strings and changing any empty values to None
(not especially recommended but including here b/c I had to do this in real life once)
1 | |
Get rid of non-numeric values throughout a DataFrame:
1 2 3 4 5 6 7 8 9 | |
Clean up missing values in multiple DataFrame columns
1 2 3 4 5 6 7 8 | |
Doing calculations with DataFrame columns that have missing values. In example below, swap in 0 for df[‘col1’] cells that contain null.
1 2 | |
Split delimited values in a DataFrame column into two new columns
1 2 3 | |
Collapse hierarchical column indexes
1 | |
Reshaping, Concatenating, and Merging Data¶
Pivot data (with flexibility about what what becomes a column and what stays a row).
1 2 3 4 | |
Concatenate two DataFrame columns into a new, single column
(useful when dealing with composite keys, for example)
1 | |
Display and formatting¶
Set up formatting so larger numbers aren’t displayed in scientific notation
1 | |
To display with commas and no decimals
1 | |
Creating DataFrames¶
Create a DataFrame from a Python dictionary
1 | |
Convert Django queryset to DataFrame
1 2 3 | |
List unique values in a DataFrame column
1 | |
Grab DataFrame rows where column has certain values
1 2 | |
Grab DataFrame rows where column doesn’t have certain values
1 2 | |
Delete column from DataFrame
1 | |
Select from DataFrame using criteria from multiple columns
(use | instead of & to do an OR)
1 | |
Rename several DataFrame columns
1 2 3 4 5 | |
Lower-case all DataFrame column names
1 | |
Even more fancy DataFrame column re-naming lower-case all DataFrame column names (for example)
1 | |
Loop through rows in a DataFrame (if you must)
1 2 | |
Much faster way to loop through DataFrame rows if you can work with tuples (h/t hughamacmullaniv)
1 2 | |
Next few examples show how to work with text data in Pandas. Full list of .str functions Slice values in a DataFrame column (aka Series)
1 | |
Lower-case everything in a DataFrame column
1 | |
Get length of data in a DataFrame column
1 | |
Sort dataframe by multiple columns
1 | |
Get top n for each group of columns in a sorted dataframe (make sure dataframe is sorted first)
1 | |
Grab DataFrame rows where specific column is null/notnull
1 | |
Select from DataFrame using multiple keys of a hierarchical index
1 | |
Change all NaNs to None (useful before loading to a db)
1 | |
More pre-db insert cleanup…make a pass through the dataframe, stripping whitespace from strings and changing any empty values to None (not especially recommended but including here b/c I had to do this in real life one time)
1 | |
Get quick count of rows in a DataFrame
1 | |
Pivot data (with flexibility about what what becomes a column and what stays a row). Syntax works on Pandas >= .14
1 2 3 4 | |
Change data type of DataFrame column
1 | |
Get rid of non-numeric values throughout a DataFrame:
1 2 | |
Set DataFrame column values based on other column values (h/t: @mlevkov)
1 | |
Clean up missing values in multiple DataFrame columns
1 2 3 4 5 6 7 8 | |
Concatenate two DataFrame columns into a new, single column (useful when dealing with composite keys, for example)
1 | |
Doing calculations with DataFrame columns that have missing values In example below, swap in 0 for df[‘col1’] cells that contain null
1 | |
Split delimited values in a DataFrame column into two new columns
1 | |
Collapse hierarchical column indexes
1 | |
Create a DataFrame from a Python dictionary
1 | |
Get a report of all duplicate records in a dataframe, based on specific columns
1 | |
Set up formatting so larger numbers aren’t displayed in scientific notation (h/t @thecapacity)
1 | |
Pandas Code example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |