• Stars
    star
    107
  • Rank 322,599 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A python implementation of C4.5 algorithm by R. Quinlan

C4.5

An implementation of C4.5 machine learning algorithm in python

C4.5 Algorithm

C4.5 is an algorithm developed by John Ross Quinlan that creates decision tress. A decision tree is a tool that is used for classification in machine learning, which uses a tree structure where internal nodes represent tests and leaves represent decisions. C4.5 makes use of information theoretic concepts such as entropy to classify the data.

alt text

Data Format

For each dataset there should be two files, one that describes the classes and attributes and one that consists of the actual data. The file for attributes and classes should contain all the classes in first line and after that, line by line the attributes and their possible values if the attribute is discrete. For continuos(numerical) attributes, possible values would be "continuos". Check the iris dataset folder for actual data and more specific syntax.

Usage

Create a C4.5 object like this

c1 = C45("path_to_data_file", "path_to_description_file")

After this, you can fetch and preprocess the data, generate the tree and print it to screen.

Running Tests

Navigate to the directory "C4.5" and type python -m unittest discover to run all the test modules under "C4.5/tests" folder. (the names of the modules should start with "test" and end with ".py")

Relevant Links