• Stars
    star
    207
  • Rank 188,652 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created almost 4 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

https://circleci.com/gh/AllenInstitute/deepinterpolation.svg?style=svg

# Deep Interpolation

deepinterpolation is a Python library to denoise data by removing independent noise. Importantly training does NOT require ground truth. This repository is currently meant to support the bioRxiv publication results : https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1

# Principle of Deep Interpolation

principle of deep interpolation

Figure 1 - Schematic introducing the principles of deep interpolation. A. An interpolation model is trained to predict a noisy block from other blocks with independent noise. The loss is the difference between the predicted data and a new noisy block. B. The interpolation model is used to create a noiseless version of the input data.

For more information, consult the associated bioRxiv publication : https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1

# Support

For bug and issues, please submit issue tickets on this repository. For installation and running support, we are trying to move to using the more public discussion forum on this repository (https://github.com/AllenInstitute/deepinterpolation/discussions). Alternatively you can join the slack channel where the past support history was saved (if invitation has expired: email to Jerome): https://join.slack.com/t/deepinterpolation/shared_invite/zt-rkmcw7h1-v8y0Grwe3fZg4m~DiAQVMg

# Installation

In all cases, unless you only want to work from CPU, you will have to consider installing tensorflow GPU dependencies (ie. cuda drivers). To that end, you might have to consult tensorflow documentation to enable your GPU.

To install the package, you have 2 options.

  1. Install from pypi using:

Create new conda environment called 'local_env'

conda create -n local_env python=3.7

Our integration tests on the CI server are currently running with python 3.7. While it is likely working with other versions, we cannot guarantee it.

pip install deepinterpolation

This will install the latest deployed stable version and only the core components of the library. You will NOT have access to sample datasets present on this repository.

  1. Install from a clone of this repository.

This will give you access to the latest developments as well as the provided sample data. Our step by step example assume this installation mode as it depends on the sample datasets.

The small training examples below works on both CPU and GPU architecture (ie. even a small macbook). If you are not familiar with using deep learning, we recommend to play with smaller datasets first, such as the example Neuropixel data provided.

Our integration tests on the CI server are currently running with python 3.7. While it is likely working with other versions, we cannot guarantee it.

  • activate environment

    conda activate local_env

  • install necessary packages

    make init

  • install deepinterpolation package

    python setup.py install

# Descrition and use of the Command Line Interface (CLI).

DeepInterpolation 0.1.3 introduced a refactored interface to use the package. The purpose of this mode is to faciliate deployment of deepinterpolation and provide a consistent API for use. Example use of the CLI are provided in the examples/ folder under cli_*.

There are two modes that you can use:

  • Scripting mode:

In this mode you construct a set of dictionaries of parameters and feed them to the training, inference or finetuning objects within a python script. This mode is useful to iterate and improve your jobs. Example of this mode are provided in the examples/ folder as cli_*.py files.

  • Command-line mode:

In this mode, you save the dictionary into a json file and provide the path to this file as a parameter through the command line. This mode is useful for deploying your jobs at a larger scale. Typically your json file is mostly the same from job to job. Example of this mode are provided in the examples/ folder as cli_*.sh and cli_*.json files.

All parameters of the CLI are documented within the schema. To access the documentation, type down :

python -m deepinterpolation.cli.training --help

or

python -m deepinterpolation.cli.inference --help

or

python -m deepinterpolation.cli.fine_tuning --help

# General package description

The files in the deepinterpolation folder contain the core classes for training, inferrence, loss calculation and network generations. Those are called 'Collection'. Each collection is essentially a local list of functions that are used to create different type of objects and can be extended on one another. For instance, the network_collection.py contains a list of networks that can be generated for training. This allows for quick iteration and modification of an architecture while keeping the code organized.

# FAQ

See here : https://github.com/AllenInstitute/deepinterpolation/tree/master/faq

# Example training

To try out training your own DeepInterpolation network, we recommend to start with this file: https://github.com/AllenInstitute/deepinterpolation/blob/master/examples/cli_example_tiny_ephys_training.py

In this file, you will need to edit the paths to a local folder appropriate to save your models.

Then, activate your conda env called 'local_env'

conda activate local_env

then run

python cli_example_tiny_ephys_training.py

If everything runs correctly, you should see the following in just a few minutes :

2020-10-19 18:01:03.735098: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. sh: sysctl: command not found 2020-10-19 18:01:03.749184: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f9b1f115860 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-19 18:01:03.749202: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version WARNING:tensorflow:period argument is deprecated. Please use save_freq to specify the frequency in number of batches seen. Epoch 1/5 10/10 [==============================] - 19s 2s/step - loss: 0.4597 - val_loss: 0.3987 Epoch 2/5 10/10 [==============================] - 20s 2s/step - loss: 0.3796 - val_loss: 0.3785 Epoch 3/5 10/10 [==============================] - 22s 2s/step - loss: 0.3646 - val_loss: 0.3709 Epoch 4/5 10/10 [==============================] - 21s 2s/step - loss: 0.3797 - val_loss: 0.3698 Epoch 5/5 10/10 [==============================] - 21s 2s/step - loss: 0.3835 - val_loss: 0.3675 Saved model to disk

This is a toy example but you can increase the number of training frames to increase the quality of the model. All parameters are commented in the file. To adjust to a larger dataset, change the path parameters, the start_frame and end_frame parameters. Please consult the CLI documentation mentioned above for more details of each parameter.

# Example inference

Raw pre-trained models are available as separate h5 file on Dropbox.

The following models are currently available :

Two-photon Ai93 excitatory line DeepInterpolation network:

Key recording parameters:

Two-photon Ai148 excitatory line DeepInterpolation network:

Key recording parameters:

Neuropixel DeepInterpolation network:

Key recording parameters:

  • Neuropixels Phase 3a probes
  • 374 simultaneous recording sites across 3.84 mm, 10 reference channels
  • Four-column checkerboard site layout with 20 µm spacing between rows
  • 30 kHz sampling rate
  • 500x hardware gain setting
  • 500 Hz high pass filter in hardware, 150 Hz high-pass filter applied offline.
  • Pre-processing: Median subtraction was applied to individual probes to remove signals that were common across all recording sites. Each probe recording was mean-centered and normalized with a single pair of value for all nodes on the probe.
  • Docker hub id : 245412653747/deep_interpolation:allen_neuropixel
  • Dropxbox link : https://www.dropbox.com/sh/tm3epzil44ybalq/AACyKxfvvA2T_Lq_rnpHnhFma?dl=0

fMRI DeepInterpolation network:

Key recording parameters:

To start inference, we recommend to start with this file: https://github.com/AllenInstitute/deepinterpolation/blob/master/examples/cli_example_tiny_ephys_inference.py

In this file, you will need to edit the paths strings to fit your local paths.

Then, activate your conda env called 'local_env'

conda activate local_env

then run:

python cli_example_tiny_ephys_inference.py

If everything runs correctly, you should see the following in just a few minutes:

2020-10-20 14:10:37.549061: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. sh: sysctl: command not found 2020-10-20 14:10:37.564133: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f82ada8a520 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-20 14:10:37.564156: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version

This is a toy example but you can increase the start_frame and end_frame variable for larger data.

It is important to keep in mind that this process is easily parallelizable. In practice, we wrapped this code with additional routines to leverage 20 to 100 cluster CPU nodes to accelerate this process. You could also use GPU nodes as well, we just had access to a much larger number of CPU machines quickly.

# Adapting the module to a newer data structure

To adapt DeepInterpolation to a new dataset, you will need to use or recreate a generator in 'generator_collection.py'. Those are all constructed from core classes called DeepGenerator and SequentialGenerator.

The CollectorGenerator class allows to group generators if your dataset is distributed across many files/folder/sources. This system was designed to allow to train very large DeepInterpolation models from TB of data distributed on a network infrastructure. The CollectorGenerator is not currently supported throught the CLI and will be replaced with a simpler API in a future release.

# License

Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.

Copyright © 2019. Allen Institute. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Redistributions and use for commercial purposes are not permitted without the Allen Institute’s written permission. For purposes of this license, commercial purposes are the incorporation of the Allen Institute's software into anything for which you will charge fees or other compensation or use of the software to perform a commercial service for a third party. Contact [email protected] for commercial licensing opportunities.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

More Repositories

1

AllenSDK

code for reading and processing Allen Institute for Brain Science data
Jupyter Notebook
340
star
2

bmtk

Brain Modeling Toolkit
Python
262
star
3

openai_tools

Growing collection of scripts to summarize the scientific literature using large-language models like ChatGPT.
HTML
109
star
4

scrattch.hicat

Hierarchical, iterative clustering for analysis of transcriptomics data in R
HTML
105
star
5

ecephys_spike_sorting

Modules for processing extracellular electrophysiology data from Neuropixels probes
Python
105
star
6

openscope_databook

OpenScope databook: a collaborative, versioned, data-centric collection of foundational analyses for reproducible systems neuroscience 🐁🧠🔬🖥️📈
Python
64
star
7

sonata

Collaboration between BBP and AIBS
Python
51
star
8

abc_atlas_access

Documentation and examples demonstrating how to access data from the Allen Brain Cell Atlas
Jupyter Notebook
46
star
9

aics-segmentation

AICS Segmentation (One-Way) Mirror
Jupyter Notebook
44
star
10

mouse_connectivity_models

Python package providing mesoscale connectivity models for mouse.
Python
38
star
11

neuroglia

a Python machine learning library for neurophysiology data
Python
36
star
12

cocoframer

COmmon COordinate FRAMEwork in R
R
35
star
13

scrattch

Single cell RNA-seq analysis for transcriptomic type characterization
R
30
star
14

MicronsBinder

A collection of notebooks to provide examples of using Microns-explorer.org datasets
Jupyter Notebook
28
star
15

aics-ml-segmentation

AICS ML Segmentation (One-Way) Mirror
Python
26
star
16

ipfx

computes intrinsic cell features from intracellular electrophysiology data
Python
24
star
17

brain_observatory_examples

Gallery of visualizations and analyses of the Allen Brain Observatory
Jupyter Notebook
24
star
18

tasic2018analysis

Scripts related to VISp and ALM scRNA-seq and FISH analysis for Tasic, et al., 2018.
R
24
star
19

scrattch.vis

scRNA-seq data visualization from scrattch
R
24
star
20

argschema

This python module simplifies the development of modules that would like to define and check a particular set of input parameters, but be able to flexibly define those inputs in different ways in different contexts.
Python
24
star
21

MTG_celltypes

Human cortical cell types
R
23
star
22

SWDB_2019

Repository for course materials for the Summer Workshop on the Dynamic Brain 2019
HTML
23
star
23

MIES

Multichannel Igor Electrophysiology Suite
IGOR Pro
22
star
24

SWDB_2018

Course materials for the 2018 Summer Workshop on the Dynamic Brain
Jupyter Notebook
22
star
25

BICCN_M1_Evo

Analysis for BICCN cross-species primary motor cortex project
Jupyter Notebook
22
star
26

visual_behavior_analysis

Python package for analyzing behavioral data for Brain Observatory: Visual Behavior
Jupyter Notebook
21
star
27

swdb_2022

Repository to host materials for the summer workshop on the dynamic brain
Jupyter Notebook
21
star
28

dipde

Numerical solver for coupled population density equations
Python
21
star
29

neuron_morphology

Tools for working with single-neuron morphological reconstructions
Python
19
star
30

AIBSOPT

Optical projection tomography microscope for isotropic whole-brain imaging
Python
15
star
31

All-active-Workflow

Creating the code base for All-active Model generation written on top of Bluepyopt
Python
15
star
32

patchseqtools

QC and cell type assignment for patch-seq transcriptomics data
HTML
15
star
33

biophys_optimize

Optimization of single-cell biophysically detailed models
Python
15
star
34

cell_type_mapper

Repository for storing prototype functionality implementations for the BKP
Python
14
star
35

visual_coding_2p_analysis

Visual Coding 2P analysis code
Python
14
star
36

NucCellTypes

HTML
13
star
37

drcme

Dimensionality-reduction and classification for morphology and electrophysiology
Python
12
star
38

MouseBrainHierarchy

Python
12
star
39

allensdk.eye_tracking

Allen Institute mouse eye tracker.
Python
12
star
40

asap-modules

Shared repo for EM connectomics and Array Tomography render based image processing modules
Python
12
star
41

octoDAC

8 channel analog output shield for Arduino Uno
C++
12
star
42

nomenclature

This repository contains code to generate standardized cell type nomenclature from an R "dendrogram".
HTML
12
star
43

scrattch.io

Functions for handling RNA-seq files and formats as input and output for scrattch functions.
R
11
star
44

mfishtools

Building Gene Sets and Mapping mFISH Data
HTML
11
star
45

SWDB_2017

course materials for the 2017 Summer Workshop on the Dynamic Brain
HTML
11
star
46

aifi-swanson-teaseq

Code related to Swanson, et. al. for TEA-seq, ICICLE-seq, and scATAC-seq data processing, analysis, and visualization
R
11
star
47

ophys_etl_pipelines

Pipelines and modules for processing optical physiology data
Python
9
star
48

scrattch.bigcat

iterative clustering pipeline of big single cell datasets.
R
9
star
49

tasic2016data

Single cell transcriptomic data from Tasic, et al. (2016)
R
9
star
50

ZeroMQ-XOP

The ZeroMQ XOP enables Igor Pro to interface over the network using a ZeroMQ messaging layer and JSON as message format
C
9
star
51

nwb-api

HTML
8
star
52

open_dataset_tools

Tools for accessing open data sets published by the Allen Institute for Brain Sciences
Jupyter Notebook
8
star
53

pytic

PyTic - An Object-Oriented Python Wrapper for Pololu Tic Stepper Drivers
Python
8
star
54

All-active-Manuscript

Code for reproducing the figures presented in: Single-neuron models linking electrophysiology, morphology and transcriptomics across cortical cell types
Jupyter Notebook
8
star
55

swdb_2017_tools

A collaborative Python package built by participants of the Summer Workshop on the Dynamic Brain
Jupyter Notebook
8
star
56

ImageryClient

Publication-ready visualization of overlays of imagery and segmentation from cloudvolume data
Python
8
star
57

coupledAE-patchseq

Multimodal data alignment and cell type analysis with coupled autoencoders.
Jupyter Notebook
8
star
58

piTEAM

Parallel Imaging Pipeline Using Transmission Electron Automated Microscopes (piTEAM)
8
star
59

scrattch.mapping

Genearlized mapping scripts for RNA-seq and Patch-seq data
R
8
star
60

aics-automated-cell-culture-workflow

Venus method files in the “.pkg” format for the Semi Automated Passaging, Seeding and Maintenance, of the Allen cell collection of hiPSC lines expressing green fluorescent protein tagged to protein identifying specific cellular organelles and structures.
8
star
61

pcg_skel

Robust skeletonization of PyChunkedGraph-backed objects
Python
7
star
62

render-python

A python interface for render
Python
7
star
63

GLIF_Teeter_et_al_2018

Analysis code for Teeter et al 2018 Nature Communications Manuscript
Python
6
star
64

coupledAE

Repository for NeurIPS 2019 paper
Jupyter Notebook
6
star
65

neuropixels_platform_paper

Code used to generate figures for Siegle, Jia et al. (2019)
Jupyter Notebook
6
star
66

U-DAGAN

Unsupervised data augmentation using GANs.
Python
6
star
67

3D-atlas-reverse-mapping

This repository contains a Jupyter Notebook that demonstrates the reverse mapping of the 3D atlas onto an imaged section.
Jupyter Notebook
5
star
68

swdb_2019_student

A repository for student code for the 2019 Summer Workshop on the Dynamic Brain
Jupyter Notebook
5
star
69

arkhipov2018_layer4

Modeling code related to Arkhipov 2018 publication
Python
5
star
70

stpt_registration

STPT registration methods
C++
5
star
71

CCF_Tutorial

Jupyter Notebook
5
star
72

swdb_2021

Jupyter Notebook
5
star
73

patchseq_human_L23

Code for reproducing the analyses presented in "Human cortical expansion involves diversification and specialization of supragranular intratelencephalic-projecting neurons".
HTML
5
star
74

DashDataFrame

An simplified interface for making dash apps to explore multi-dimensional dataframes with custom link integration and filtering.
Python
5
star
75

L5_VEN

This repository contains code for reproducing the analysis of snRNA-seq data from human fronto-insula
R
5
star
76

CCFv3_Volumetric_Analysis

Scripts for measuring surface area and volumes of structures in both the reference and original spaces.
Python
5
star
77

human_cross_areal

Human cellular diversity across cortex
Jupyter Notebook
4
star
78

CNS_2021

Resources for Allen Instutite tutorials presented at CNS 2021
Jupyter Notebook
4
star
79

deepinterpolation_nextflow

Python
4
star
80

em_coregistration

align a 3D data set to another
Python
4
star
81

croissant

Classifying regions of interest in mouse brain as cell or not cell
Python
4
star
82

ephys_pink_noise

Creates the pink noise stimuli used in the noise1, noise2, and ramp to rheo stimulus used in the Allen Institute for Brain Science electrophysiology pipeline.
Python
4
star
83

em_stitch

EM microscopy lens correction
Python
4
star
84

neuropixels_protocol_resources

Notebooks and design files for Allen Brain Observatory experimental procedures
Jupyter Notebook
4
star
85

Great_Ape_MTG

snRNA-seq of temporal cortex in great apes
Jupyter Notebook
4
star
86

brain_observatory_utilities

A repository for analysis tools for data available from the AllenSDK
Jupyter Notebook
4
star
87

skeleton_plot

tools for plotting swc and meshparty skeletons
Jupyter Notebook
3
star
88

deepinterpolation_paper

Python
3
star
89

Guidebook

Python
3
star
90

GLIFS_ASC

GLIFS plus after spike currents
Python
3
star
91

patchseq_human_L1

supporting data and code for human L1 patchSeq manuscript
Jupyter Notebook
3
star
92

jem

Repository containing JEM (JSON Electrophysiology Metadata), a web-form for logging metadata during slice electrophysiology experiments.
HTML
3
star
93

ophys_nway_matching

Python
3
star
94

cortical_coordinates

Code and scripts for used in the generation of cortical streamlines and coordinates for the Allen CCF
C++
3
star
95

Contrast_Analysis

Python
3
star
96

BigFeta

Big FeaTure Aligner: a scalable solution for feature alignment
Python
3
star
97

PeptidergicNetworks

Repository for Smith et al. 2019
R
3
star
98

QCR

QCR stands for Quality Control workflow for sc/sn RNA-seq data
R
3
star
99

EM_aligner_python

Python version of Khaled Khairy's EM_aligner, supporting distributed assembly and solve
Python
3
star
100

bonsai_workflows

A collection of bonsai workflows developed at the Allen Institute
Jupyter Notebook
3
star