• Stars
    star
    121
  • Rank 292,273 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    BSD 3-Clause "New...
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Reproducible machine learning analysis of gene expression and alternative splicing data

Stories in Ready Build Status Coverage Status License Downloads Latest Version DOI Binder

flotilla

Gitter

flotilla Logo

What is flotilla?

flotilla is a Python package for visualizing transcriptome (RNA expression) data from hundreds of samples. We include utilities to perform common tasks on these large data matrices, including:

  • Dimensionality reduction
  • Classification and Regression
  • Outlier detection
  • Network graphs from covariance
  • Hierarchical clustering

And common tasks for biological data including:

  • Renaming database features to gene symbols
  • Coloring/marking samples based on experimental phenotype
  • Removing poor-quality samples (technical outliers)

Finally, flotilla is a platform for active collaboration between bioinformatics scientists and traditional "wet lab" scientists. Leveraging interactive widgets in the iPython Notebook, we have created tools for simple and streamlined data exploration including:

  • Subsetting sample groups and feature (genes/splicing events) groups
  • Dynamically adjusting parameters for analysis
  • Integrating external lists of features from the web or local files

These empower the "wet lab" scientists to ask questions on their own and gives bioniformatics scientists a platform and share their analysis tools.

What flotilla is not

flotilla is not a genomics pipeline. We expect that you have already generated data tables for gene expression, isoform expression and metadata. flotilla only makes it easy to integrate all those data parts together once you have the pieces.

Learn how to use flotilla

Please refer to our talks to learn more about how you can apply our tools to your data.

Installation

Docker Installation Instructions

Docker is the preferred method to obtain the most up-to-date version of flotilla. Every change we make to the source code triggers a new build of a virtual machine that contains flotilla and all its dependencies.

Please follow instructions here to get, install, and run the flotilla image.

Local install (on your computer)

To install, first install the Anaconda Python Distribution, which comes pre-packaged with a bunch of the scientific packages we use all the time, pre-installed.

Create a Flotilla sandbox

We recommend creating a "sandbox" where you can install any and all packages without disturbing the rest of the Python distribution. You can do this with

conda create --yes --name flotilla_env --file conda_requirements.txt

You've now just created a "virtual environment" called flotilla_env (the first argument). Now activate that environment with,

source activate flotilla_env

Now at the beginning of your terminal prompt, you should see:

(flotilla_env)

Which indicates that you are now in the flotilla_env virtual environment. Now that you're in the environment, follow along with the non-sandbox installation instructions.

Install and update all packages in your environment

To make sure you have the latest packages, on the command line in your terminal, enter this command:

conda install --yes --file conda_requirements.txt

Not all packages are available using conda, so we'll install the rest using pip, which is a Python package installer and installs from PyPI, the Python Package Index.

pip install -r requirements.txt

Next, to install the latest release of flotilla, do

pip install flotilla

If you want the bleeding-edge master version (that we work really hard to make sure it's always working but could be buggy!), then install the git master with,

pip install git+git://github.com/yeolab/flotilla.git

Test dataset

We have prepared a slice of the full dataset for testing and demonstration purposes.

Run each of the following code lines in its own IPython notebook cell for an interactive feature.

import flotilla
study = flotilla.embark(flotilla._shalek2013)

study.interactive_pca()

study.interactive_graph()

study.interactive_classifier()

study.interactive_lavalamp_pooled_inconsistent()

IMPORTANT NOTE: for this test,several failures are expected since the test set is small. Adjust parameters to explore valid parameter spaces. For example, you can manually select all_genes as the feature_subset from the drop-down menu that appears after running these interactive functions.

Problems? Questions?

We invite your input! Please leave any feedback on our issues page.

NumFOCUS logo

Proudly sponsored by a NumFOCUS John Hunter Technical Fellowship to Olga Botvinnik.

More Repositories

1

single-cell-bioinformatics

Course material in notebook format for learning about single cell bioinformatics methods
Jupyter Notebook
99
star
2

clipper

A tool to identify CLIP-seq peaks
Python
64
star
3

outrigger

Create a *de novo* alternative splicing database, validate splicing events, and quantify percent spliced-in (Psi) from RNA seq data
Python
61
star
4

sailor

CWL+Singularity implementation of an RNA editing workflow
Jupyter Notebook
39
star
5

eCLIP

Common Workflow Language
37
star
6

anchor

⚓ Find bimodal, unimodal, and multimodal features in your data
Python
25
star
7

rbp-maps

splicing and feature maps for RBPs
Python
22
star
8

qtools

qtools has helper functions to submit jobs to compute clusters (PBS on TSCC, SGE on oolite) from within Python
Python
21
star
9

singlecell_pnm

✨ Code and figures accompanying the paper, "Single-cell alternative splicing analysis with Expedition reveals splicing dynamics during neuron differentiation" by Song and Botvinnik, et al
Jupyter Notebook
20
star
10

MINES

(m)6A (I)dentification Using (N)anopor(E) (S)equencing
Python
18
star
11

gscripts

General Use Scripts and Helper functions
Python
17
star
12

Expedition

Expedition suite for computing, visualizing, and analyzing single-cell alternative splicing data
11
star
13

merge_peaks

Pipeline for using IDR to produce a set of peaks given two replicate eCLIP peaks
Shell
9
star
14

bonvoyage

📐 Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.
Python
8
star
15

skipper

Skip the peaks and expose RNA-binding in CLIP data
R
7
star
16

BMS_bioinformatics_bootcamp_2017

Jupyter Notebook
7
star
17

repetitive-element-mapping

pipeline for mapping repetitive elements
Perl
6
star
18

FLARE

RNA edit detection (SAILOR) and peak calling (FLARE)
Python
6
star
19

cshl_2022

Collection of material to use for CSHL's Single Cell Analysis course 2022
Jupyter Notebook
4
star
20

onboarding

Getting started in the yeolab
Shell
4
star
21

Yeo_STAMP_Nature_Methods

Code repository for "Robust single-cell discovery of RNA targets of RNA binding proteins and ribosome"
Jupyter Notebook
4
star
22

MSTP_bioinformatics_bootcamp_2016

Jupyter Notebook
3
star
23

single-cell-bioinformatics-scrm-2016

Single cell bioinformatics class at the Sanford Consortium for Regenerative Medicine (SCRM) in 2016
Jupyter Notebook
3
star
24

bodymap2

Flotilla package of Illumina bodymap data
2
star
25

cshl_2019

Course materials for CSHL 2019
Jupyter Notebook
2
star
26

makebigwigfiles

Converts a BAM file into strand-specific bigwig files.
Python
2
star
27

peak-simulator

A tool to simulate CLIP-seq peaks
Python
2
star
28

eclipdemux

demultiplex utility for eclip raw fastq files (process eclip barcodes and ramdomers)
Python
1
star
29

yeolab.github.io

Do not edit this repo! Everything here is machine-generated by Travis-CI every time we push to the yeolab/yeolab.github.io-source page
HTML
1
star
30

singlesail

A toolkit to help sail you through single-cell analyses
Python
1
star
31

CRaftID

software to accompany CRaftID paper
Jupyter Notebook
1
star
32

hcp

Python port of Hierarchical Covariate with Prior from Matlab code
Python
1
star
33

shalek2013

Dataset for Shalek et al 2013 single-cell analysis paper
1
star
34

clip_analysis_legacy

Python
1
star
35

PRINTER

Codebase used to generate analysis for PRINTER manuscript.
Jupyter Notebook
1
star
36

chim-eCLIP

chimeric eCLIP processing pipeline
Common Workflow Language
1
star
37

SWIMMER

Statistical Workflow for Identification of Molecular Modulators of ribonucleoprotEins by Random variance modeling
Jupyter Notebook
1
star