• Stars
    star
    237
  • Rank 169,885 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 3 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

Last Commit Stars Badge Forks Badge Size Pull Requests Badge Issues Badge Language MIT License

binder colab

10_Python_Pandas_Module

Introduction 👋

What is Pandas in Python?

Pandas is the most famous python library providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.

In Pandas, the data is usually utilized to support the statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.

Main Features

Here are just a few of the things that pandas does well:

  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
  • Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
  • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
  • Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
  • Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
  • Intuitive merging and joining datasets
  • Flexible reshaping and pivoting of datasets
  • Hierarchical labeling of axes (possible to have multiple labels per tick)
  • Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format
  • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

Core Components of Pandas Data Structure

Pandas have two core data structure components, and all operations are based on those two objects. Organizing data in a particular way is known as a data structure. Here are the two pandas data structures:

  • Series
  • DataFrame

Table of contents 📋

No. Name
01 Python_Pandas_DataFrame
1.1 001_Python_Pandas_DataFrame_from_Dictionary
1.2 Python_Pandas_DataFrame_from_List
1.3 Python_Pandas_DataFrame_head()_and_tail()
1.4 004_Python_Pandas_DataFrame_drop_columns
1.5 Python_Pandas_DataFrame_drop_duplicates
1.6 Python_Pandas_DataFrame_drop_columns_with_NA
1.7 Python_Pandas_DataFrame_rename_columns
1.8 Python_Pandas_DataFrame_to_Python_dictionary
1.9 Python_Pandas_DataFrame_set_index
1.10 Python_Pandas_DataFrame_reset_index
02 Python_Pandas_Exercise_1
03 Python_Pandas_Exercise_2
automobile_data.csv
pokemon_data.csv
04 Pandas Cheat Sheet Data Wrangling in Python.pdf
05 Pandas Cheat Sheet for Data Science in Python.pdf

These are online read-only versions. However you can Run ▶ all the codes online by clicking here ➞ binder


Install Pandas Module:

Open your Anaconda Prompt propmt and type and run the following command (individually):

  •   pip install pandas  
    

Once Installed now we can import it inside our python code.


Frequently asked questions ❔

How can I thank you for writing and sharing this tutorial? 🌷

You can Star Badge and Fork Badge Starring and Forking is free for you, but it tells me and other people that it was helpful and you like this tutorial.

Go here if you aren't here already and click ➞ ✰ Star and ⵖ Fork button in the top right corner. You'll be asked to create a GitHub account if you don't already have one.


How can I read this tutorial without an Internet connection? GIF

  1. Go here and click the big green ➞ Code button in the top right of the page, then click ➞ Download ZIP.

    Download ZIP

  2. Extract the ZIP and open it. Unfortunately I don't have any more specific instructions because how exactly this is done depends on which operating system you run.

  3. Launch ipython notebook from the folder which contains the notebooks. Open each one of them

    Kernel > Restart & Clear Output

This will clear all the outputs and now you can understand each statement and learn interactively.

If you have git and you know how to use it, you can also clone the repository instead of downloading a zip and extracting it. An advantage with doing it this way is that you don't need to download the whole tutorial again to get the latest version of it, all you need to do is to pull with git and run ipython notebook again.


Authors ✍️

I'm Dr. Milaan Parmar and I have written this tutorial. If you think you can add/correct/edit and enhance this tutorial you are most welcome🙏

See github's contributors page for details.

If you have trouble with this tutorial please tell me about it by Create an issue on GitHub. and I'll make this tutorial better. This is probably the best choice if you had trouble following the tutorial, and something in it should be explained better. You will be asked to create a GitHub account if you don't already have one.

If you like this tutorial, please give it a star.


Licence 📜

You may use this tutorial freely at your own risk. See LICENSE.

More Repositories

1

93_Python_Data_Analytics_Projects

This repository contains all the data analytics projects that I've worked on in python.
Jupyter Notebook
491
star
2

01_Python_Introduction

Learn the basics of Python. These tutorials are for Python beginners. so even if you have no prior knowledge of Python, you won’t face any difficulty understanding these tutorials.
Jupyter Notebook
319
star
3

91_Python_Mini_Projects

Jupyter Notebook
304
star
4

06_Python_Object_Class

Object-oriented programming (OOP) is a method of structuring a program by bundling related properties and behaviors into individual objects. In this tutorial, you’ll learn the basics of object-oriented programming in Python.
Jupyter Notebook
293
star
5

07_Python_Advanced_Topics

You'll learn about Iterators, Generators, Closure, Decorators, Property, and RegEx in detail with examples.
Jupyter Notebook
288
star
6

90_Python_Examples

The best way to learn Python is by practicing examples. The repository contains examples of basic concepts of Python. You are advised to take the references from these examples and try them on your own.
Jupyter Notebook
288
star
7

Clustering-Datasets

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
274
star
8

Python_Decision_Tree_and_Random_Forest

I've demonstrated the working of the decision tree-based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample. All the steps have been explained in detail with graphics for better understanding.
Jupyter Notebook
251
star
9

02_Python_Datatypes

Data types specify the different sizes and values that can be stored in the variable. For example, Python stores numbers, strings, and a list of values using different data types. Learn different types of Python data types along with their respective in-built functions and methods.
Jupyter Notebook
235
star
10

Python_Computer_Vision_from_Scratch

This repository explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos.
Jupyter Notebook
234
star
11

04_Python_Functions

The function is a block of code defined with a name. We use functions whenever we need to perform the same task multiple times without writing the same code again. It can take arguments and returns the value.
Jupyter Notebook
231
star
12

09_Python_NumPy_Module

Numpy is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python. Besides its obvious scientific uses, Numpy can also be used as an efficient multi-dimensional container of generic data.
Jupyter Notebook
230
star
13

08_Python_Date_Time_Module

Time is undoubtedly the most critical factor in every aspect of life. Therefore, it becomes very essential to record and track this component. In Python, date and time can be tracked through its built-in libraries. This article on Date and time in Python will help you understand how to find and modify the dates and time using the time and datetime modules.
Jupyter Notebook
225
star
14

05_Python_Files

Python too supports file handling and allows users to handle files i.e., to read and write files, along with many other file handling options, to operate on files. The concept of file handling has stretched over various other languages, but the implementation is either complicated or lengthy, but like other concepts of Python, this concept here is also easy and short. Python treats files differently as text or binary and this is important.
Jupyter Notebook
225
star
15

LaTeX4Everyone

Learn LaTeX from scratch in an easy-to-follow but highly effective way. Get up to the level of professional document writeup, presentation creation and even generating graphics and figures in LaTeX.
TeX
223
star
16

03_Python_Flow_Control

Flow control is the order in which statements or blocks of code are executed at runtime based on a condition. Learn Conditional statements, Iterative statements, and Transfer statements
Jupyter Notebook
222
star
17

11_Python_Matplotlib_Module

Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was introduced by John Hunter in the year 2002. One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram, etc
Jupyter Notebook
219
star
18

12_Python_Seaborn_Module

Seaborn is one of the go-to tools for statistical data visualization in python. It has been actively developed since 2012 and in July 2018, the author released version 0.9. This version of Seaborn has several new plotting features, API changes and documentation updates which combine to enhance an already great library. This article will walk through a few of the highlights and show how to use the new scatter and line plot functions for quickly creating very useful visualizations of data.
Jupyter Notebook
218
star
19

DataScience_Interview_Questions

My Solutions to 120 commonly asked data science interview questions.
Jupyter Notebook
210
star
20

milaan9

Python
195
star
21

Clustering_Algorithms_from_Scratch

Implementing Clustering Algorithms from scratch in MATLAB and Python
Jupyter Notebook
194
star
22

Python_Natural_Language_Processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
Jupyter Notebook
191
star
23

92_Python_Games

This repository contains Python games that I've worked on. You'll learn how to create python games with AI. I try to focus on creating board games without GUI in Jupyter-notebook.
Jupyter Notebook
190
star
24

Machine_Learning_Algorithms_from_Scratch

This repository explores the variety of techniques and algorithms commonly used in machine learning and the implementation in MATLAB and PYTHON.
Jupyter Notebook
185
star
25

Deep_Learning_Algorithms_from_Scratch

This repository explores the variety of techniques and algorithms commonly used in deep learning and the implementation in MATLAB and PYTHON
Jupyter Notebook
169
star
26

13_Python_scikit-learn_Module

27
star
27

JLUFE_Intelligent_Tech_2005-2006

Jupyter Notebook
22
star
28

JLUFE-Python-Statistical-Analysis-Modeling_52192-62193

Jupyter Notebook
21
star
29

Python_Data_Science_Feature_Selection_Tutorials

An introduction to feature selection in data science using Python (NumPy, Pandas, Scikit-learn) with Jupyter notebooks.
12
star
30

94_Computer_Vision_Projects

Jupyter Notebook
10
star
31

Residual_Error_based_Clustering_Algorithms

9
star
32

TIL

Python
7
star
33

cSharp_Programming

3
star
34

Website

1
star