🐼
PandasAI
PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it.
🔧 Quick install
pip install pandasai
🔍 Demo
Try out PandasAI in your browser:
📖 Documentation
The documentation for PandasAI can be found here.
💻 Usage
Disclaimer: GDP data was collected from this source, published by World Development Indicators - World Bank (2022.05.26) and collected at National accounts data - World Bank / OECD. It relates to the year of 2020. Happiness indexes were extracted from the World Happiness Report. Another useful link.
PandasAI is designed to be used in conjunction with pandas. It makes pandas conversational, allowing you to ask questions to your data in natural language.
Queries
For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return a DataFrame containing only those rows:
import pandas as pd
from pandasai import PandasAI
# Sample DataFrame
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
# Instantiate a LLM
from pandasai.llm.openai import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")
pandas_ai = PandasAI(llm)
pandas_ai(df, prompt='Which are the 5 happiest countries?')
The above code will return the following:
6 Canada
7 Australia
1 United Kingdom
3 Germany
0 United States
Name: country, dtype: object
Of course, you can also ask PandasAI to perform more complex queries. For example, you can ask PandasAI to find the sum of the GDPs of the 2 unhappiest countries:
pandas_ai(df, prompt='What is the sum of the GDPs of the 2 unhappiest countries?')
The above code will return the following:
19012600725504
Charts
You can also ask PandasAI to draw a graph:
pandas_ai(
df,
"Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
You can save any charts generated by PandasAI by setting the save_charts
parameter to True
in the PandasAI
constructor. For example, PandasAI(llm, save_charts=True)
. Charts are saved in ./pandasai/exports/charts
.
Multiple DataFrames
Additionally, you can also pass in multiple dataframes to PandasAI and ask questions relating them.
import pandas as pd
from pandasai import PandasAI
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
llm = OpenAI()
pandas_ai = PandasAI(llm)
pandas_ai([employees_df, salaries_df], "Who gets paid the most?")
The above code will return the following:
Oh, Olivia gets paid the most.
You can find more examples in the examples directory.
⚡️ Shortcuts
PandasAI also provides a number of shortcuts (beta) to make it easier to ask questions to your data. For example, you can ask PandasAI to clean_data
, impute_missing_values
, generate_features
, plot_histogram
, and many many more.
# Clean data
pandas_ai.clean_data(df)
# Impute missing values
pandas_ai.impute_missing_values(df)
# Generate features
pandas_ai.generate_features(df)
# Plot histogram
pandas_ai.plot_histogram(df, column="gdp")
Learn more about the shortcuts here.
🔒 Privacy & Security
In order to generate the Python code to run, we take the dataframe head, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the head.
Also, if you want to enforce further your privacy you can instantiate PandasAI with enforce_privacy = True
which will not send the head (but just column names) to the LLM.
⚙️ Command-Line Tool
Pai is the command line tool designed to provide a convenient way to interact with PandasAI through a command line interface (CLI). In order to access the CLI tool, make sure to create a virtualenv for testing purpose and to install project dependencies in your local virtual environment using pip
by running the following command:
Read more about how to use the CLI here.
🤝 Contributing
Contributions are welcome! Please check out the todos below, and feel free to open a pull request. For more information, please see the contributing guidelines.
After installing the virtual environment, please remember to install pre-commit
to be compliant with our standards:
pre-commit install
📜 License
PandasAI is licensed under the MIT License. See the LICENSE file for more details.
Acknowledgements
- This project is based on the pandas library by independent contributors, but it's in no way affiliated with the pandas project.
- This project is meant to be used as a tool for data exploration and analysis, and it's not meant to be used for production purposes. Please use it responsibly.