• Stars
    star
    401
  • Rank 107,653 (Top 3 %)
  • Language
  • License
    MIT License
  • Created over 5 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine learning and deep learning resources

Machine- and Deep Learning resources

MIT License PR's Welcome

Machine and deep learning and data analysis resources. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Table of content

Cheatsheets

Awesome Deep Learning

Keras, Tensorflow

PyTorch

JAX

JAX is a combination of Automatic Differentiation and XLA (Accelerated Linear ALgebra). XLA is a compiler developed by Google to work on TPU units. Jax has Numpy as its higher layer of abstraction, and works the same way on CPU, GPU, and TPU (much faster).

  • awesome-jax - JAX - A curated list of resources

  • JAX - Jupyter (Colab) notebooks introducing JAX basic (jit, vmap, pmap, grad, and other) and advanced concepts, by @yvrjsharma

Graph Neural Networks

Transformers

DL Books

DL Courses & Tutorials

DL Videos

DL Papers

DL Papers Genomics

DL Tools

  • Interactive_Tools - Interactive Tools for Machine Learning, Deep Learning and Math. Play with deep neural network in browser

  • ivy - The Unified Machine Learning Framework supporting JAX, TensorFlow, PyTorch, MXNet, and Numpy. Python module. Documentation

  • keras - Deep Learning for humans http://keras.io/

  • MXNet-Gluon-Style-Transfer - neural artistic style transfer using MXNet. PyTorch and Torch implementations available

  • openai.com - GPT-3 Access Without the Wait (API access to GPT-3)

  • OpenCV - Open Source Computer Vision library. GitHub, opencv-python - CPU-only OpenCV packages for Python. Documentation. Video - 3h OpenCV crash course

  • pathology_learning - Using traditional machine learning and deep learning methods to predict stuff from TCGA pathology slides

  • ruta - Unsupervised Deep Architechtures in R, autoencoders. Requires Keras and TensorFlow. Book

  • tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research

  • Janggu - deep learning interface to genomic data (FASTA, BAM, BigWig, BED, GFF). Numpy-like Bioseq and Cover objects accessable by Keras. Includes model evaluation and interpretation features. Pypi, Docs, Janggu - Deep learning for genomics

  • maui - Multi-omics Autoencoder Integration. Latent factors from different data types (stacked variational autoencoders), and their clustering, testing for association with survival. Tested vs. latent factors extracted using Multifactor Analysis (MFA) and iCluster+, on TCGA colorectal cancer RNA-seq, SNPs, CNVs. Evaluation of Colorectal Cancer Subtypes and Cell Lines Using Deep Learning

  • Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. GitHub

  • Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

  • PennAI - AI-Driven Data Science, entry-level machine learning interface for non-experts. A System for Accessible Artificial Intelligence

Auto ML

DL models

DL projects

ChatGPT, LLMs

DL Misc

  • app.wombo.art - deep generative model dreaming awesome images from text, Android and iOS apps available. Tweet describing the VQGAN+CLIP technology behind it

  • ColossalAI - A Unified Deep Learning System for Big Model Era. Scaling deep learning models using data, pipeline, tensor, and sequence parallelism. 1D, 2D, 2.5D, 3D distributed operators. Examples of each. Written in PyTorch, needs a configuration file defining parallelism. Benchmarked against DeepSpeed, Megatron-LM.

    Paper Li, Shenggui, Jiarui Fang, Zhengda Bian, Hongxin Liu, Yuliang Liu, Haichen Huang, Boxiang Wang, and Yang You. “Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training,” n.d.

Awesome Machine learning

ML Books

ML Courses & Tutorials

ML Videos

ML Papers

  • Domingos, Pedro. “A Few Useful Things to Know about Machine Learning.” Communications of the ACM 55, no. 10 (October 1, 2012): 78. https://doi.org/10.1145/2347736.2347755. Twelve lessons for machine learning. Overview of machine learning problems and algorithms, problem of overfitting, causes and solutions, curse of dimensionality, issues with high-dimensional data, feature engineering, bagging, boosting, stacking, model sparsity. Video lectures

ML Tools

  • mlr3 - Machine learning in R R package, the unified interface to classification, regression, survival analysis, and other machine learning tasks. GitHub repo, mlr3gallery - Examples of problems and code solutions, mlr3 Manual - mlr3 bookdown. More on the mlr3 package site, including videos

ML Misc

Material in Russian

  • Scientific_graphics_in_python - matplotlib for scientific graphics. 3 parts, 13 chapters. By Pavel Shabanov

  • ml-course-hse - machine learning course at the Computer Sciences Department, High Schoool of Economy. Multiple years, videos

  • mlcourse_open - OpenDataScience Machine Learning course (Both in English and Russian). Python-based ML course, with video lectures. Video

  • DL_CSHSE_spring2018 - Deep learning, Anton Osokin, Higher School of Economics, Computer Sciences Department (Russian), course material, and video lectures

  • Ordinary Differential Equations - Обыкновенные дифференциальные уравнения, Интерактивный учебник, Илья Щуров (НИУ ВШЭ)

  • Calculus - Математический анализ, Записки лекций, Илья Щуров (НИУ ВШЭ). Tweet

  • mathprofi.ru - Высшая математика – просто и доступно. Mirror

More Repositories

1

scRNA-seq_notes

A list of scRNA-seq analysis tools
R
510
star
2

HiC_tools

A collection of tools for Hi-C data analysis
482
star
3

HiC_data

A (continuously updated) collection of references to Hi-C data. Predominantly human/mouse Hi-C data, with replicates.
166
star
4

TCGAsurvival

Scripts to analyze TCGA data
R
113
star
5

Cancer_notes

A continually expanding collection of cancer genomics notes and data
92
star
6

Statistics_notes

Statistics, data analysis tutorials and learning resources
72
star
7

scATAC-seq_notes

scATAC-seq data analysis tools and papers
67
star
8

Immuno_notes

Immunology-related bioinformatics data and tools
61
star
9

scHiC_notes

Notes on single-cell Hi-C technologies, tools, and data
54
star
10

MDnotes

Links to all data science, genomics, and other notes
37
star
11

RNA-seq_notes

A continually expanding collection of RNA-seq tools
33
star
12

Brain_genomic_data

Brain-related -omics data
22
star
13

SNP_notes

Notes on SNP-related tools and genome variation analysis
20
star
14

gwas2bed

Extracting disease-specific genomic coordinates from GWAS catalog
HTML
18
star
15

ChIP-seq_notes

Notes on ChIP-seq and other-seq-related tools
17
star
16

blogs

Links to data science, bioinformatics, statistics, and machine learning resources
16
star
17

Aging

Epigenomic enrichment analysis of age-related genomic regions
R
15
star
18

Microbiome_notes

A continually expanding collection of microbiome analysis tools
14
star
19

RNA-seq

RNA-seq analysis scripts
R
14
star
20

Aging_clock

Data and papers related to epigenetic clocks predicting age
R
12
star
21

HiCcompareWorkshop

Differential Hi-C Data Analysis Workshop https://currentprotocols.onlinelibrary.wiley.com/doi/abs/10.1002/cpbi.76
Dockerfile
12
star
22

genomerunner_web

Web version of GenomeRunner
JavaScript
11
star
23

R_notes

Data science in R notes
9
star
24

Programming_notes

Programming-related notes
8
star
25

Methylation_notes

Notes on DNA methylation analysis
8
star
26

bioinformatics-impact

GitHub statistics as a measure of the impact of open-source bioinformatics software
TeX
7
star
27

E-MTAB-3610

Processed E-MTAB-3610 dataset - Transcriptional Profiling of 1,000 human cancer cell lines
R
7
star
28

BIOS668.2018

Web site for "Statistical Methods for High-throughput Genomic Data II" BIOS 668 course, Spring 2018 https://mdozmorov.github.io/BIOS668.2018
SCSS
7
star
29

presentations

Talks and related material
CSS
6
star
30

Python_notes

Data science in Python notes
5
star
31

manuscript_template

Template of a manuscript in Rmd
TeX
5
star
32

Jobs_notes

Notes for job seekers
5
star
33

promoter_extract

Extract genomic coordinates of the promoters from a list of genes.
Python
4
star
34

ChIP-seq

Scripts to analyze ChIP-seq data
Shell
4
star
35

BIOS691_Cancer_Bioinformatics

Course material for the BIOS691 "Cancer Bioinformatics" course, January 25 - May 7, 2021
HTML
4
star
36

Talk_3Dgenome

Slides for "The genome in action: Detecting and interpreting changes in the 3D genome organization" talk
SCSS
4
star
37

CTCF

Genomic coordinates of FIMO-predicted CTCF binding sites using JASPAR and other PWMs, human and mouse genome assemblies including mm39 and T2T. Also included experimentally derived ENCODE SCREEN CTCF-bound CREs.
R
4
star
38

MDgenomerunner

MD functions mostly for GenomeRunner project. See MDmisc R package for MD miscellaneous functions
R
4
star
39

bios524-r-2021

"Biostatistical Computing with R" course
HTML
3
star
40

BIOS691_deep_learning_R

"Deep Learning with R" course material
HTML
3
star
41

HMP2

16S rRNA sequencing data for the HMP2 project
Shell
3
star
42

Talk_reproducible_research_overview_2021

Brief overview of computational reproducible research, Unix, remote computing (SSH), Conda, pipelines, R/RMarkdown, Git/GitHub, Docker, Cloud, Kubernetes. The goal is to provide students with modern data science ecosystem of tools for further studies.
JavaScript
3
star
43

MDmisc

MD helper functions. Previous version at https://github.com/mdozmorov/MDgenomerunner
R
2
star
44

R.genomerunner

Scripts and examples of visualization and analysis of the enrichment and epigenomic similarity results
HTML
2
star
45

dcaf

Misc. scripts and examples
Shell
2
star
46

Grants_notes

Notes on potential funding opportunities
2
star
47

activeranges

Expanding collection of biologically active chromatin regions as GRanges.
R
2
star
48

GTEx

Playground with GTEx data
R
2
star
49

63_immune_cells

Gene expression profiles of 63 immune cell types
R
2
star
50

R.Lorin.RNA-seq

Interpretation of RNA-seq data
R
2
star
51

Talk_preciseTAD

Slides for "preciseTAD: A transfer learning framework for 3D domain boundary prediction at base-pair resolution" presentation
SCSS
2
star
52

GenomeRunner

Automating genome exploration
Visual Basic
1
star
53

Talk_Genomics

Talk for the Science Club, Department of Pathology, VCU. May 15, 2019.
1
star
54

PCAworkshop

A introduction to PCA in R
Dockerfile
1
star
55

deconvolution

Cell type-specific deconvolution of 'omics' data
R
1
star
56

Talk_JSM2019

Slides for JSM2019, "SpectralTAD: Defining Hierarchy of Topologically Associated Domains Using Graph Theoretical Clustering"
1
star
57

Methylation850K

Methylation analysis of Illumina 850K arrays
R
1
star
58

beamer_template

Beamer template for RMarkdown class presentation
1
star
59

Talk_ISMB2020

TADcompare abstract for the virtual ISMB 2020 conference
1
star
60

grdocs

GenomeRunner documentation
TeX
1
star
61

R.-ChIP-seq.histone

Analysis of histone marks, and their differential presence in the genome
R
1
star
62

Talk_HiCcompare

Slides for HiCcompareWorkshop
HTML
1
star
63

R.Sjogren

Sjogren syndrome microarray data analysis
HTML
1
star
64

lecture1

Test repo
1
star
65

BIOS567

Web site for "Statistical Methods for High-throughput Genomic Data I" BIOS 567 course
1
star
66

Data_notes

Lists of publicly available datasets for machine learning
1
star
67

PathwayRunner

PathwayRunner computed enrichment of gene set(s) in all pathways using hypergeometric test
R
1
star
68

GDS-processor

Process GDS files from Gene Expression Omnibus (GEO)
Visual Basic
1
star
69

Talk_Hi-C

An overview presentation of chromatin conformation capture technologies and analysis methods.
1
star
70

Quantile-normalization

Quantile normalization of gene expression matrix with missing values
Visual Basic
1
star
71

RepeatSoaker

a simple method to eliminate low-complexity short reads
Makefile
1
star
72

BIOS567.2017

Web site for "Statistical Methods for High-throughput Genomic Data I" BIOS 567 course, Fall 2017
SCSS
1
star