• Stars
    star
    314
  • Rank 133,353 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python tools for healthcare machine learning

healthcareai

Code Health Appveyor build status Build Status

PyPI version DOI GitHub license

The aim of healthcareai is to streamline machine learning in healthcare. The package has two main goals:

  • Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to a database such as MSSQL, MySQL, SQLite or csv flat file.
  • Provide tools related to data cleaning, manipulation, and imputation.

Installation

Windows

  • If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
    • Important When prompted for the Installation Type, select Just Me (recommended). This makes permissions later in the process much simpler.
  • Open the terminal (i.e., CMD or PowerShell, if using Windows)
  • Run conda install pyodbc
  • Upgrade to latest scipy (note that upgrade command took forever)
  • Run conda remove scipy
  • Run conda install scipy
  • Run conda install scikit-learn
  • Install healthcareai using one and only one of these three methods (ordered from easiest to hardest).
    1. Recommended: Install the latest release with pip run pip install healthcareai
    2. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run pip install https://github.com/HealthCatalyst/healthcareai-py/zipball/master

Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:

  • When running anaconda and installing packages using the conda command, you don't need to worry about dependency hell, particularly because packages aren't compiled on your machine; conda installs pre-compiled binaries.
  • A great example of the pain the using conda saves you is with the python package scipy, which, by their own admission "is difficult".

Linux

You may need to install the following dependencies:

  • sudo apt-get install python-tk
  • sudo pip install pyodbc
    • Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai

macOS

  • pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

  • Install docker
  • Clone this repo (look for the green button on the repo main page)
  • cd into the cloned directory
  • run docker build -t healthcareai .
  • run the docker instance with docker run -p 8888:8888 healthcareai
  • You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import SupervisedModelTrainer and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow (with the healthcare-ai tag) so we can help others along this process.

Getting started

  1. Read through the Getting Started section of the healthcareai-py documentation.

  2. Read through the example files to learn how to use the healthcareai-py API.

    • For examples of how to train and evaluate a supervised model, inspect and run either example_regression_1.py or example_classification_1.py using our sample diabetes dataset.
    • For examples of how to use a model to make predictions, inspect and run either example_regression_2.py or example_classification_2.py after running one of the first examples.
    • For examples of more advanced use cases, inspect and run example_advanced.py.
  3. To train and evaluate your own model, modify the queries and parameters in either example_regression_1.py or example_classification_1.py to match your own data.

  4. Decide what type of prediction output you want. See Choosing a Prediction Output Type for details.

  5. Set up your database tables to match the schema of the output type you chose.

  6. Congratulations! After running one of the example files with your own data, you should have a trained model. To use your model to make predictions, modify either example_regression_2.py or example_classification_2.py to use your new model. You can then run it to see the results.

For Issues

  • Double check that the code follows the examples here
  • If you're still seeing an error, create a post in Stack Overflow (with the healthcare-ai tag) that contains
    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error
  • You can also log a new issue in the GitHub repo by clicking here

More Repositories

1

healthcareai-r

R tools for healthcare machine learning
R
245
star
2

Fabric.Cashmere

Health Catalystโ€™s comprehensive design system.
TypeScript
66
star
3

Fabric.Authorization

Permissions service for applications
C#
21
star
4

documentation

Content for healthcare.ai, old posts, some hosted notebooks
HTML
14
star
5

Fabric.Identity

Identity service to provide authentication
C#
12
star
6

hcposh

HCPosh is a Powershell module that provides some useful functions and tools when working with data in the Health Catalyst Analytics Platform. Key Features include 1) Split SAM Designer hcx files into smaller files for source control using it's built-in column-level SQL Parser, developed using the Microsoft.SqlServer.TransactSql.ScriptDom library. 2) Generate a React web application for documentation that contains ERD and Data Flow Diagrams for a professional look and presentation of a subject area mart 3) Integration of Graphviz software for ERD and Data flow diagram generation (pdf, png, and svg)
PowerShell
11
star
7

PythonPowershellUtilities

The only powershell module you should ever need.
PowerShell
9
star
8

Fabric.Realtime

Provides a real-time messaging service where the client can subscribe to a queue to receive HL7 messages
C#
7
star
9

Fabric.Databus

Pipeline to convert SQL into JSON and send to ElasticSearch or other REST Api
C#
5
star
10

InstallScripts

Shell
5
star
11

Catalyst.SqlUtilities

SQL Parsers and such
C#
4
star
12

SSIS

Supporting SSIS project containing extensibile packages for R and Python.
3
star
13

Fabric.Realtime.RabbitMq

RabbitMq with configurations needed for Fabric.Realtime
Shell
3
star
14

Fabric.Terminology

Service to provide shared healthcare terminology data
C#
3
star
15

Fabric.Docker.InterfaceEngine

Docker for interface engine to use for realtime
Shell
3
star
16

dos.powershell

powershell functions to control DOS
PowerShell
2
star
17

react-cashmere

React version of @HealthCatalyst/Fiber.Cashmere styles applied to @mui/material components.
TypeScript
2
star
18

Fabric.Docker.NGINX-Kerberos

Docker container for running NGINX as a reverse proxy with Kerberos Authentication
Shell
2
star
19

DosInstallUtilities.Kube

PowerShell
1
star
20

Fabric.Realtime.Tester

C#
1
star
21

Fabric.EHR

REST service for rendering the Fabric Pane inside an EHR
JavaScript
1
star
22

Fabric.FHIR

FHIR REST service
C#
1
star