• Stars
    star
    487
  • Rank 90,352 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tutorial on Julia DataFrames package

An Introduction to DataFrames.jl

Bogumił Kamiński, February 13, 2023

The tutorial is for DataFrames.jl 1.5.0

A brief introduction to basic usage of DataFrames.

The tutorial contains a specification of the project environment version under which it should be run. In order to prepare this environment, before using the tutorial notebooks, while in the project folder run the following command in the command line:

julia -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'

Tested under Julia 1.9.0. The project dependencies are the following:

  [69666777] Arrow v2.4.3
  [6e4b80f9] BenchmarkTools v1.3.2
  [336ed68f] CSV v0.10.9
  [324d7699] CategoricalArrays v0.10.7
  [8be319e6] Chain v0.5.0
  [944b1d66] CodecZlib v0.7.1
  [a93c6f00] DataFrames v1.5.0
  [1313f7d8] DataFramesMeta v0.13.0
  [5789e2e9] FileIO v1.16.0
  [da1fdf0e] FreqTables v0.4.5
  [7073ff75] IJulia v1.24.0
  [babc3d20] JDF v0.5.1
  [9da8a3cd] JLSO v2.7.0
  [b9914132] JSONTables v1.0.3
  [86f7a689] NamedArrays v0.9.6
  [2dfb63ee] PooledArrays v1.4.2
  [f3b207a7] StatsPlots v0.15.4
  [bd369af6] Tables v1.10.0
  [a5390f91] ZipFile v0.10.1
  [9a3f8284] Random
  [10745b16] Statistics v1.9.0

I will try to keep the material up to date as the packages evolve.

This tutorial covers DataFrames.jl and CategoricalArrays.jl, as they constitute the core of DataFrames.jl along with selected file reading and writing packages.

In the last extras part mentions selected functionalities of selected useful packages that I find useful for data manipulation, currently those are: FreqTables.jl, DataFramesMeta.jl StatsPlots.jl.

TOC

File Topic
01_constructors.ipynb Creating DataFrame and conversion
02_basicinfo.ipynb Getting summary information
03_missingvalues.ipynb Handling missing values
04_loadsave.ipynb Loading and saving DataFrames
05_columns.ipynb Working with columns of DataFrame
06_rows.ipynb Working with row of DataFrame
07_factors.ipynb Working with categorical data
08_joins.ipynb Joining DataFrames
09_reshaping.ipynb Reshaping DataFrames
10_transforms.ipynb Transforming DataFrames
11_performance.ipynb Performance tips
12_pitfalls.ipynb Possible pitfalls
13_extras.ipynb Additional interesting packages

Changelog:

Date Changes
2017-12-05 Initial release
2017-12-06 Added description of insert!, merge!, empty!, categorical!, delete!, DataFrames.index
2017-12-09 Added performance tips
2017-12-10 Added pitfalls
2017-12-18 Added additional worthwhile packages: FreqTables and DataFramesMeta
2017-12-29 Added description of filter and filter!
2017-12-31 Added description of conversion to Matrix
2018-04-06 Added example of extracting a row from a DataFrame
2018-04-21 Major update of whole tutorial
2018-05-01 Added byrow! example
2018-05-13 Added StatPlots package to extras
2018-05-23 Improved comments in sections 1 do 5 by Jane Herriman
2018-07-25 Update to 0.11.7 release
2018-08-25 Update to Julia 1.0 release: sections 1 to 10
2018-08-29 Update to Julia 1.0 release: sections 11, 12 and 13
2018-09-05 Update to Julia 1.0 release: FreqTables section
2018-09-10 Added CSVFiles section to chapter on load/save
2018-09-26 Updated to DataFrames 0.14.0
2018-10-04 Updated to DataFrames 0.14.1, added haskey and repeat
2018-12-08 Updated to DataFrames 0.15.2
2019-01-03 Updated to DataFrames 0.16.0, added serialization instructions
2019-01-18 Updated to DataFrames 0.17.0, added passmissing
2019-01-27 Added Feather.jl file read/write
2019-01-30 Renamed StatPlots.jl to StatsPlots.jl and added Tables.jl
2019-02-08 Added groupvars and groupindices functions
2019-04-27 Updated to DataFrames 0.18.0, dropped JLD2.jl
2019-04-30 Updated handling of missing values description
2019-07-16 Updated to DataFrames 0.19.0
2019-08-14 Added JSONTables.jl and Tables.columnindex
2019-08-16 Added Project.toml and Manifest.toml
2019-08-26 Update to Julia 1.2 and DataFrames 0.19.3
2019-08-29 Add example how to compress/decompress CSV file using CodecZlib
2019-08-30 Add examples of JLSO.jl and ZipFile.jl by xiaodaigh
2019-11-03 Add examples of JDF.jl by xiaodaigh
2019-12-08 Updated to DataFrames 0.20.0
2020-05-06 Updated to DataFrames 0.21.0 (except load/save and extras)
2020-11-20 Updated to DataFrames 0.22.0 (except DataFramesMeta.jl which does not work yet)
2020-11-26 Updated to DataFramesMeta.jl 0.6; update by @pdeffebach
2021-05-15 Updated to DataFrames.jl 1.1.1
2021-05-15 Updated to DataFrames.jl 1.2 and DataFramesMeta.jl 0.8, added Chain.jl instead of Pipe.jl
2021-12-12 Updated to DataFrames.jl 1.3
2022-10-05 Updated to DataFrames.jl 1.4
2023-02-13 Updated to DataFrames.jl 1.5

Core functions summary

  1. Constructors: DataFrame, DataFrame!, Tables.rowtable, Tables.columntable, Matrix, eachcol, eachrow, Tables.namedtupleiterator, empty, empty!
  2. Getting summary: size, nrow, ncol, describe, names, eltypes, first, last, getindex, setindex!, @view, isapprox, metadata, metadata!, colmetadata, colmetadata!
  3. Handling missing: missing (singleton instance of Missing), ismissing, nonmissingtype, skipmissing, replace, replace!, coalesce, allowmissing, disallowmissing, allowmissing!, completecases, dropmissing, dropmissing!, disallowmissing, disallowmissing!, passmissing
  4. Loading and saving: CSV (package), CSVFiles (package), Serialization (module), CSV.read, CSV.write, save, load, serialize, deserialize, Arrow.write, Arrow.Table (from Arrow.jl package), JSONTables (package), arraytable, objecttable, jsontable, CodecZlib (module), GzipCompressorStream, GzipDecompressorStream, JDF.jl (package), JDF.save, JDF.load, JLSO.jl (package), JLSO.save, JLSO.load, ZipFile.jl (package), ZipFile.reader, ZipFile.writer, ZipFile.addfile
  5. Working with columns: rename, rename!, hcat, insertcols!, categorical!, columnindex, hasproperty, select, select!, transform, transform!, combine, Not, All, Between, ByRow, AsTable
  6. Working with rows: sort!, sort, issorted, append!, vcat, push!, view, filter, filter!, deleteat!, unique, nonunique, unique!, allunique, repeat, parent, parentindices, flatten, @chain (from Chain.jl package), only, subset, subset!, shuffle, prepend!, pushfirst!, insert!, keepat!
  7. Working with categorical: categorical, cut, isordered, ordered!, levels, unique, levels!, droplevels!, unwrap, recode, recode!
  8. Joining: innerjoin, leftjoin, leftjoin!, rightjoin, outerjoin, semijoin, antijoin, crossjoin
  9. Reshaping: stack, unstack
  10. Transforming: groupby, mapcols, parent, groupcols, valuecols, groupindices, keys (for GroupedDataFrame), combine, select, select!, transform, transform!, @chain (from Chain.jl package)
  11. Extras:
    • FreqTables: freqtable, prop, Name
    • DataFramesMeta: @with, @subset, @select, @transform, @orderby, @by, @combine, @eachrow, @newcol, ^, $
    • StatsPlots: @df, plot, density, histogram,boxplot, violin

More Repositories

1

The-Julia-Express

A concise Julia language introductory manual for programmers.
TeX
257
star
2

JuliaForDataAnalysis

Codes for the book "Julia for Data Analysis"
Jupyter Notebook
163
star
3

JuliaCon2021-DataFrames-Tutorial

A tutorial on DataFrames.jl prepared for JuliaCon2021
Jupyter Notebook
107
star
4

JuliaCon2020-DataFrames-Tutorial

Jupyter Notebook
52
star
5

JuliaCon2022-DataFrames-Tutorial

A Complete Guide to Efficient Transformations of DataFrames
Jupyter Notebook
35
star
6

EventSimulation.jl

An event based Discrete Event Simulation engine
Julia
21
star
7

ReadOnlyArrays.jl

A wrapper type around AbstractArray that is read-only
Julia
21
star
8

WooldridgeCode.jl

Julia code for "Introductory Econometrics" A Modern Approach", Seventh Edition by Jeffrey M. Wooldridge
Julia
18
star
9

ABCDGraphGenerator.jl

Artificial Benchmark for Community Detection (ABCD) - A Fast Random Graph Model with Community Structure
Jupyter Notebook
18
star
10

PyDataGlobal2020

An introduction to DataFrames.jl for pandas users
Jupyter Notebook
17
star
11

DataFrames-Showcase

A short showcase of DataFrames.jl
Jupyter Notebook
14
star
12

JuliaCon2023-Tutorial

An introductory part of the workshop prepared for JuliaCon2023
Jupyter Notebook
14
star
13

JuliaCon2019-DataFrames-Tutorial

A hands-on tutorial on the DataFrames.jl package
Jupyter Notebook
10
star
14

WarszawskieForumJulia

Materiały ze spotkań Warszawskiego Forum Julia
Jupyter Notebook
8
star
15

ODSC-EUROPE-2021

Dataframes.jl: a Perfect Sidekick for Your Next Data Science Project
Julia
7
star
16

UEP-Workshop-20190405

Materials for Julia workshop in UEP on 2019-04-05
Julia
5
star
17

Workshop-on-Optimization-Techniques

Workshop on Optimization Techniques for Data Science in Python and Julia
HTML
4
star
18

MakieCon2023

Presentation for MakieCon2023
Jupyter Notebook
4
star
19

ComplexNetworks2019

Summer School on Data Science Tools and Techniques in Modelling Complex Networks
Jupyter Notebook
3
star
20

ABCDHypergraphGenerator.jl

Artificial Benchmark for Hypergraphs Community Detection (ABCDH)
Julia
3
star
21

BinderOptimizationNotebook

An example how to set up a repository for optimization with MyBinder
Jupyter Notebook
2
star
22

Nanocsv.jl

A minimal implementation of CSV reader/writer for Julia
Julia
2
star
23

SmartMobilityOptimization

HTML
1
star
24

JuliaStrBenchmark

A collection of benchmarks for strings in Julia
Julia
1
star
25

FWF-deprecated

A simple package for working with fixed width format files. To be merged into https://github.com/RandomString123/FWF.jl
Julia
1
star
26

UEP-Workshop-binder20190405

mybinder version of UEP-Workshop-binder20190405
Jupyter Notebook
1
star
27

StochasticSimulation

Materials for doing stochastic simulations with Julia
Jupyter Notebook
1
star