A compilation of all my canonical posts and answers to old questions on Stack Overflow. Topic arrangement mirrors the User Guide.
If you find any bugs, or need clarification, or see something that can be improved, please feel free to leave a comment under the answer and I'll typically respond within a day.
If you found any of my content here helpful and wish to thank me, you can upvote my answer! (please don't serial upvote :-) If you'd like to do more, and have more than 75 reputation on Stack Overflow, please consider awarding me with a bounty.
Pandas Gotchas
-
Don't iterate over a DataFrame!
-
Never grow a DataFrame!
-
Good habits to build to avoid that dreaded
SettingWithCopyWarning
-
Don't use
inplace=True
!
IO tools (text, CSV, HDF5, …)
- How can I effectively load data on Stack Overflow questions using pandas read_clipboard?
- Writing a pandas DataFrame to CSV file
- Import CSV file as a pandas DataFrame
- How do I save multi-indexed pandas dataframes to parquet?
Indexing and selecting data
- How to implement 'in' and 'not in' for Pandas dataframe
- Combine duplicated columns within a DataFrame
- Deleting all columns except a few
- Right way to reverse a pandas DataFrame?
MultiIndex / advanced indexing
- How do I slice or filter MultiIndex DataFrame levels?
- Selecting columns from pandas MultiIndex
- Setting DataFrame column headers to a MultiIndex
- Reorder levels of MultiIndex in a pandas DataFrame
Merge, join, and concatenate
-
Pandas Merging 101
Reshaping and pivot tables
- Split (explode) pandas dataframe string entry to separate rows
- Pandas column of lists, create a row for each list element
Working with text data
-
Convert Columns to String in Pandas Introduce
"string"
dtype for pandas >= 1.0. -
How to lowercase a pandas dataframe string column if it has missing values?
Working with missing data
- How to drop rows of Pandas DataFrame whose value in a certain column is NaN?
- GroupBy columns with NaN (missing) values
- How to check if any value is NaN in a Pandas DataFrame
- Convert pandas.Series from dtype object to float, and errors to nans
- How to replace values with None in Pandas data frame in Python?
- Locate first and last non NaN values in a Pandas DataFrame - a discussion on
first_valid_index
andlast_valid_index
Categorical data
Nullable integer data type
Nullable Boolean Data Type
Visualization
Computational tools
Group By: split-apply-combine
- Multiple aggregations of the same column using pandas GroupBy.agg()
- Pandas GroupBy.apply method duplicates first group
- Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
- GroupBy pandas DataFrame and select most common value
- Python Pandas Create New Column with Groupby().Sum()
- How to get number of groups in a groupby object in pandas?
Time series / date functionality
Time deltas
Styling
Options and settings
Performance
- For loops with pandas - When should I care?
- Dynamic Expression Evaluation in pandas using pd.eval()
- When should I (not) want to use pandas apply() in my code?
- Performant cartesian product (CROSS JOIN) with pandas
- What is the performance impact of non-unique indexes in pandas?
Scaling to large datasets
Sparse data structures
Frequently Asked Questions (FAQ)
- How to iterate over rows in a DataFrame in Pandas?
- What is the difference between Series.replace and Series.str.replace?
- Add column of empty lists to DataFrame
- Change data type of columns in Pandas
- Drop rows containing empty cells from a pandas DataFrame
- How to get rid of “Unnamed: 0” column in a pandas DataFrame?
- Difference between map, applymap and apply methods in Pandas
- Convert list of dictionaries to a pandas DataFrame
- Find the max of two or more columns with pandas
- Sorting by absolute value without changing the data
- How do I convert a pandas column or index to a Numpy array?
- Convert pandas dataframe to NumPy array
- Logical operators for boolean indexing in Pandas
- What is the difference between size and count in pandas?
- 'DataFrame' object has no attribute 'sort'
- Python pandas insert list into a cell
- How to add a suffix (or prefix) to each column name?
- How do I get the row count of a Pandas dataframe?
- Rename a specific column in pandas
- Get list from pandas DataFrame column headers