Pandas¶
A personal diary of DataFrame munging over the years.
Data Types and Conversion¶
Convert Series datatype to numeric (will error if column has non-numeric values)
1 |
|
Convert Series datatype to numeric, getting rid of any non-numeric values
1 |
|
Convert Series datatype to numeric, changing non-numeric values to NaN
1 |
|
Convert Django queryset to DataFrame
1 2 3 |
|
Change data type of DataFrame column
1 |
|
Exploring and Finding Data¶
Get a report of all duplicate records in a DataFrame, based on specific columns
1 2 |
|
List unique values in a DataFrame column
1 |
|
For each unique value in a DataFrame column, get a frequency count
1 |
|
Grab DataFrame rows where column = a specific value
1 |
|
Grab DataFrame rows where column value is present in a list
1 2 3 4 |
|
Grab DataFrame rows where column value is not present in a list
1 2 3 4 |
|
Select from DataFrame using criteria from multiple columns
(use |
instead of &
to do an OR)
1 |
|
Loop through rows in a DataFrame
(if you must)
1 2 |
|
Much faster way to loop through DataFrame rows if you can work with tuples
1 2 |
|
Get top n for each group of columns in a sorted DataFrame
(make sure DataFrame is sorted first)
1 2 3 |
|
Grab DataFrame rows where specific column is null/notnull
1 |
|
Select from DataFrame using multiple keys of a hierarchical index
1 2 3 |
|
Slice values in a DataFrame column (aka Series)
1 |
|
Get quick count of rows in a DataFrame
1 |
|
Get length of data in a DataFrame column
1 |
|
Updating and Cleaning Data¶
Delete column from DataFrame
1 |
|
Rename several DataFrame columns
1 2 3 4 5 |
|
Lower-case all DataFrame column names
1 |
|
Even more fancy DataFrame column re-naming
lower-case all DataFrame column names (for example)
1 |
|
Lower-case everything in a DataFrame column
1 |
|
Sort DataFrame by multiple columns
1 2 |
|
Change all NaNs to None (useful before loading to a db)
1 |
|
More pre-db insert cleanup…make a pass through the DataFrame, stripping whitespace from strings and changing any empty values to None
(not especially recommended but including here b/c I had to do this in real life once)
1 |
|
Get rid of non-numeric values throughout a DataFrame:
1 2 3 4 5 6 7 8 9 |
|
Clean up missing values in multiple DataFrame columns
1 2 3 4 5 6 7 8 |
|
Doing calculations with DataFrame columns that have missing values. In example below, swap in 0 for df[‘col1’] cells that contain null.
1 2 |
|
Split delimited values in a DataFrame column into two new columns
1 2 3 |
|
Collapse hierarchical column indexes
1 |
|
Reshaping, Concatenating, and Merging Data¶
Pivot data (with flexibility about what what becomes a column and what stays a row).
1 2 3 4 |
|
Concatenate two DataFrame columns into a new, single column
(useful when dealing with composite keys, for example)
1 |
|
Display and formatting¶
Set up formatting so larger numbers aren’t displayed in scientific notation
1 |
|
To display with commas and no decimals
1 |
|
Creating DataFrames¶
Create a DataFrame from a Python dictionary
1 |
|
Convert Django queryset to DataFrame
1 2 3 |
|
List unique values in a DataFrame column
1 |
|
Grab DataFrame rows where column has certain values
1 2 |
|
Grab DataFrame rows where column doesn’t have certain values
1 2 |
|
Delete column from DataFrame
1 |
|
Select from DataFrame using criteria from multiple columns
(use |
instead of &
to do an OR)
1 |
|
Rename several DataFrame columns
1 2 3 4 5 |
|
Lower-case all DataFrame column names
1 |
|
Even more fancy DataFrame column re-naming lower-case all DataFrame column names (for example)
1 |
|
Loop through rows in a DataFrame (if you must)
1 2 |
|
Much faster way to loop through DataFrame rows if you can work with tuples (h/t hughamacmullaniv)
1 2 |
|
Next few examples show how to work with text data in Pandas. Full list of .str functions Slice values in a DataFrame column (aka Series)
1 |
|
Lower-case everything in a DataFrame column
1 |
|
Get length of data in a DataFrame column
1 |
|
Sort dataframe by multiple columns
1 |
|
Get top n for each group of columns in a sorted dataframe (make sure dataframe is sorted first)
1 |
|
Grab DataFrame rows where specific column is null/notnull
1 |
|
Select from DataFrame using multiple keys of a hierarchical index
1 |
|
Change all NaNs to None (useful before loading to a db)
1 |
|
More pre-db insert cleanup…make a pass through the dataframe, stripping whitespace from strings and changing any empty values to None (not especially recommended but including here b/c I had to do this in real life one time)
1 |
|
Get quick count of rows in a DataFrame
1 |
|
Pivot data (with flexibility about what what becomes a column and what stays a row). Syntax works on Pandas >= .14
1 2 3 4 |
|
Change data type of DataFrame column
1 |
|
Get rid of non-numeric values throughout a DataFrame:
1 2 |
|
Set DataFrame column values based on other column values (h/t: @mlevkov)
1 |
|
Clean up missing values in multiple DataFrame columns
1 2 3 4 5 6 7 8 |
|
Concatenate two DataFrame columns into a new, single column (useful when dealing with composite keys, for example)
1 |
|
Doing calculations with DataFrame columns that have missing values In example below, swap in 0 for df[‘col1’] cells that contain null
1 |
|
Split delimited values in a DataFrame column into two new columns
1 |
|
Collapse hierarchical column indexes
1 |
|
Create a DataFrame from a Python dictionary
1 |
|
Get a report of all duplicate records in a dataframe, based on specific columns
1 |
|
Set up formatting so larger numbers aren’t displayed in scientific notation (h/t @thecapacity)
1 |
|
Pandas Code example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|