• This repository has been archived on 17/Mar/2023
  • Stars
    star
    142
  • Rank 257,096 (Top 6 %)
  • Language
    HTML
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is the repository that contains the analysis of the lung adenocarcinoma single cell dataset

This is the repository that contains the analysis of the lung adenocarcinoma single cell dataset

Getting started

Clone the repo Download the Data_input folder from the link below into the repo: https://drive.google.com/drive/folders/1sDzO0WOD4rnGC7QfTKwdcQTx3L36PFwX?usp=sharing

Scripts

Importing and Creating Seurat Object

01_Import_data_and_metadata.Rmd: Import raw data and metadata. Output of this scipt is saved as "S01_Data_and_metadata.RData".

02_Create_Seurat_object.Rmd: Imports .RData object from script 01. Creates initial Seurat object and performs initial quality control. Final output object is saved as "S02_Main_Seurat_object_filtered.RData".

02.1_Create_Seurat_object_neo_osi.Rmd: Creates a Seurat object of addtional samples (n=5)and performs initial quality control. Final output object is saved as "S02.1_Main_Seurat_object_filtered_neo_osi.RData".

03_Merge_in_NeoOsi.Rmd: Imports the .RData objects generating from 02 and 02.1 and merges them to a single object. Final output object is saved as "S03_Merged_main_filtered_with_neo_osi.RData".

03.1_Subset_and_general_annotations.Rmd: Imports .RData object from script 03. In this script we subset samples to those with greater than 10 cells and perform clustering. The cells are annotated and subset to immune or non-immune cells datasets. Produced objects are "S03_Main_Seurat_object_filtered_and_subset.RData", "S03_Immune_Seurat_object_nodups.RData", and "S03_Nonimmune_Seurat_object.RData".

Immune Compartment Analysis

IM01_Subset_cluster_annotate_immune_cells.Rmd: Imports .RData object from script 03 above that includes only immune cells. In the script cells are clustered and annotated. Object that is produced and saved at the end of the script is called "IM01_Immune_Seurat_object.RData".

IM02_immune_cell_changes_with_response_to_treatment.Rmd: Imports .RData object from script IM01. Within the script we investigate changes in the fraction of immune populations in regard to treatment status across all patients.

IM03_Subset_cluster_annotate_MFs-monocytes_LUNG.Rmd: Subsetting and clustering of all macrophages/monocytes from Lung biopsies followed by treatment stage specific analysis of resulting populations. Output object is called IM03_MFs_Seurat_object.RData.

IM04_Subset_cluster_annotate_T-cells_LUNG.Rmd: Subsetting and clustering of all macrophages/monocytes from Lung biopsies followed by treatment stage specific analysis of resulting populations. Output object is called IM04_Tcells_Seurat_object.RData.

IM05_Immune_cells_across_pats_with_multiple_biopsies.Rmd: Analysis of fractional population changes in patients with multiple biopsies.

Non-Immune Compartment Analysis

NI01_General_annotation_of_nonimmune_cells.Rmd: Imports .RData object from script 03 that included only non-immune cells. In the script cells are clustered and annotated. Objects that is produced and saved at the end of the script is called "NI01_Nonimmune_Seurat_object_annotated.RData".

NI02_epi_subset_and_cluster.Rmd: Imports .RData object from NI01. In this script we subset the cells to only those that are epithelial and re cluster cells. The resulting subset object is saved at the end of the script as "NI02_Epi_Seurat_object_clustered.RData".

NI03_inferCNV.Rmd: Imports .RData object from NI02. Creates the input for InferCNV.

NI03.1_Running_inferCNV_R3_4_4.Rmd: Imports the input files generated from NI03 (stored in inferCNV_nodups in the Data_input folder(see above)). In this script we use InferCNV in R3.4.4 to identify cancer and non-cancer epithelial cells. The cells are annotated and the resulting object is saved at the end of the script as "NI03_epithelial_annotated_tumor.RData".

NI04_Cancer_cells_DEgenes.Rmd: Imports .RData object generated from NI03. In this script we subset the data to cancer cells only and then find the differenitailly expressed genes from three comparisions: 1. TN vs PER, 2. TN vs PD, and 3. PER vs PD. The cancer cell only object is saved as "NI04_tumor_seurat_object.RData".

NI05_Annotation_of_Nontumor_epi.Rmd: Imports .RData object generated from NI03. In this script we subset the data to non-cancer cells only. The non-cancer epithelial cells are then clustered and annotated. The non-cancer epithelial cell object is saved as "NI05_normalepi_seurat_object_annotated.RData".

NI06_mutation_analysis.Rmd: Imports .RData object generated from NI04. In this script we combine outputs from cerebra to a create mutational table.

NI07_TH226_cancercell_analysis.Rmd: Imports .RData object generated from NI04. In this script we subset the data to a single patient with mutliple biopsies and find the differenitailly expressed genes from three comparisions: 1. TN vs PER, 2. TN vs PD, and 3. PER vs PD. We also investigate the expression of five gene expression signatures found within the grouped ananlysis in NI04.

NI08_Gene_expression_plotting.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures found within the grouped ananlysis in NI04.

NI09_AT2_sig_compare.Rmd: Imports .RData objects from NI04 and NI05 as well as data and metadata files in /Data_input/GSE130148_data. In this script we compare cancers cells from each treatment timepoint (TN, PER, PD), as well as non-cancer AT2 cells to an outside dataset of healthy AT2 cells.

NI10_TCGA_clinical_outcomes.Rmd: Imports three input files from /Data_input/TCGA. We compare the five gene expression signatures found within the grouped analysis of NI04 to patient surival outcomes within the TCGA.

NI11_WES_analysis.ipynb: This notebook compares the mutations identified in whole-exome-seq to those identified with scRNA-seq, for the same patient samples.

NI12_msk_analysis.ipynb: This notebook creates figures 2D & E. Takes as input msk_impact_2017/MSK-IMPACT_cosmic_tier1.txt, msk_impact_2017data_clinical_patient_edit.txt and mutation_input/coverage_all_cells_cerebra.csv from Data_input/. Investigates survival outcomes of a large cohort of LAUD patients with similar/different mutational profiles as the ones we identify in our patient samples.

NI13_get_ercc_substitution_rate.py: python script for calculating per-base substitution rates in ERCC standards, for a large group of scRNA-seq bam files.

NI14_qpcr_analysis.Rmd: R code to plot qPCR analysis for cancer cells.

NI15_multiplex_IF_analysis.Rmd: R code to plot Immune IF analysis.

NI16_cancercell_EGFR_ALK.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures found within the subsets of EGFR and ALK samples.

NI17_cancercell_PDsigs.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures, and the comprising genes, found within the samples of PD treatment timepoint samples.

Cerebra

We used the Cerebra tool (https://github.com/czbiohub/cerebra) report mutations and read coverage of regions of interest within genes. Cerebra uses GATK output files (.vcfs) as input and is comprised of several modules which do the following: 1) generate a cell_x_gene mutation-counts matrix, 2) generate a cell_x_ROI summary table that reports amino acid level mutations for a user-defined list of genes, 3) report read coverage (variant vs total reads) to each ROI.

More Repositories

1

noise2self

A framework for blind denoising with self-supervision.
Jupyter Notebook
313
star
2

tabula-muris

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.
HTML
183
star
3

xicor

xi correlation method adapted for python
Jupyter Notebook
126
star
4

tabula-muris-senis

Tabula Muris Senis
Jupyter Notebook
93
star
5

learn-bioinformatics

List of resources for learning bioinformatics, from beginner to advanced
Jupyter Notebook
87
star
6

cerebra

A tool for fast and accurate summarizing of variant calling format (VCF) files
Python
60
star
7

tabula-sapiens

Single Cell Transcriptomics of 25 Human Organs to Create a Tabula Sapiens
Jupyter Notebook
38
star
8

MIDAS

Metagenomic Intra-Species Diversity Analysis (MIDAS)
Python
35
star
9

iohub

Pythonic and parallelizable I/O for N-dimensional imaging data with OME metadata
Python
27
star
10

sc2-illumina-pipeline

Bioinformatics pipeline for SARS-CoV-2 sequencing at CZ Biohub
Nextflow
25
star
11

tabula-muris-vignettes

Examples analyses using the single-cell RNA-seq data from mouse cell atlases
Makefile
22
star
12

orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
Python
18
star
13

scRFE

Single-cell identity definition using random forest modelling and recursive feature elimination
Jupyter Notebook
12
star
14

AIRRscape

R Shiny tool to interactively visualize and analyze antibody repertoires
R
10
star
15

excellxgene

JavaScript
10
star
16

utilities

A collection of scripts for common data management and processing tasks
Nextflow
9
star
17

mspipelines

Python
8
star
18

california-mosquito-study

Source code and data supporting our eLife manuscript on single mosquito sequencing (https://elifesciences.org/articles/68353)
Jupyter Notebook
8
star
19

coPylot

coPylot - microscope control
Python
8
star
20

singlecell-dash

Dashboard for visualizing sequencing QC of plates from a single cell RNA-seq project
Python
7
star
21

shrimPy

shrimPy: Smart High-throughput Robust Imaging & Measurement in Python
Python
7
star
22

dashit

Automatically create CRISPR guides for DASH ✂️
Python
6
star
23

flash

Python
6
star
24

crispycrunch

Web app for CRISPR experiment setup and analysis
Python
6
star
25

Spid.jl

Julia
5
star
26

covid19-transcriptomics-pathogenesis-diagnostics-results

Jupyter Notebook
5
star
27

scrnaseq-for-the-99-percent

Code accompanying the paper titled, "Single-cell transcriptomics for the 99.9% of species without reference genomes" by Botvinnik et al, 2021
Jupyter Notebook
5
star
28

nf-ortholog

Compare orthologous genes across species using k-mer similarity
Nextflow
4
star
29

InSituToolkit

An installer for in situ transcriptomics image processing tools
Python
4
star
30

pomelo

A novel bioinformatic approach for identifying metabolic vulnerabilities of pathogens to Inform host-directed therapeutics
R
4
star
31

nf-predictorthologs

*de novo* orthologous gene predictions from bam + bed or fasta/fastq data
Nextflow
4
star
32

EpiGen-COVID19

Using both phylogenetic and time-series data from the COVID19 pandemic to estimate number of missing infections and predict forward epidemic trajectory.
R
4
star
33

opencell-portal-pub

Public read-only repo for the OpenCell database and web app
Jupyter Notebook
3
star
34

v-host-factor-db

An interactive database for virus-centric CRISPR screens
HTML
3
star
35

cellxgene-training

how to make the most out of cellxgene
3
star
36

napari-mosquito-bb-annotations

Napari plugin for bounding box annotations
Python
3
star
37

covidtracker_notes

notes on frameshifts and other genbank related notes
3
star
38

bcell_pipeline

Immcantation B-cell Repertoire Sequencing Pipeline adapted for Reflow
R
3
star
39

czb-ui

CZ Biohub's higher level components made with CZI's Science Design System Component Library.
TypeScript
3
star
40

CovidTissueAtlas

UCSF Covid Tissue Atlas
Jupyter Notebook
2
star
41

CRISPRflow

MAGeCK analysis automated by Nextflow
R
2
star
42

bucketbase

Jupyter Notebook
2
star
43

human_melanocytes

Single cell RNA-seq of human melanocytes
Jupyter Notebook
2
star
44

simscity

A library to simulate single-cell data
Python
2
star
45

molecular-cross-validation

Calibrate and compare methods for denoising single-cell RNA-seq data
Python
2
star
46

MS-AutoQC

Realtime quality control for mass spectrometry data acquisition
Python
2
star
47

instapipeline

Tools for crowd-sourced annotation of FISH images
Jupyter Notebook
2
star
48

interns-2021-dataviz

Repository for Introduction to Data Visualization in Python for Interns 2021
Jupyter Notebook
2
star
49

maca

Command line utilities for RNA-sequencing data
Python
2
star
50

nf-epigen

Jupyter Notebook
2
star
51

datahub

Monorepo for Datahub projects
Rust
2
star
52

celltype_annotation_tutorial

A repository for notebooks, figures, and utilities for cell-type annotation tutorial manuscript using CZ CELLxGENE ecosystem
Jupyter Notebook
2
star
53

ImageStabilizer

Stabilize x-y shifts in long time series videos. This is a wrapper around the ImageJ plugin "ImageStabilizer" that enables the jar to be run from command line with conventional flags
Java
2
star
54

zebrahub_analysis

Analysis of single-cell single-embryo RNA seq data from the zebrafish developmental atlas Zebrahub.
Jupyter Notebook
2
star
55

InfectedCellMicroscopyAnnotator

A Napari Plugin for annotating open-cell data
Python
1
star
56

epidemiology_flux_model

Jupyter Notebook
1
star
57

pairani

Average Nucleotide Identity Pairwise Distance Computation For Microbiome
Dockerfile
1
star
58

ncov-modeling-jc

HTML
1
star
59

hovernet_he

Python
1
star
60

realtime-covid19-tracking

realtime-tracking-covid19
1
star
61

covidhub-pub

Jupyter Notebook
1
star
62

pyseus

Pyseus: Perseus in Python
Python
1
star
63

2021-opencell-microscopy-automation

Public repo for OpenCell-related microscopy automation software
Jupyter Notebook
1
star
64

nf-simulaternaseq

Simulate RNA-seq reads
Nextflow
1
star
65

ds-infected-cell-summer

Jupyter Notebook
1
star
66

BioE-Bartender

Software for hardware systems built in response to emerging needs
C++
1
star
67

GenoPrimer

Automated primer design for genotyping CRISPR edited cells
Python
1
star
68

automated-protein-purifier

Contains Python application and czpurifier package for automated protein purification
Python
1
star
69

dotblotr

dotblotr: a microarray image processing package
Python
1
star
70

UVScope-control

UVScope control software
MATLAB
1
star
71

Label-Free-Malaria

Image processing software for the Label-free malaria imaging project.
MATLAB
1
star
72

2021-opencell-figures

Public read-only repo for code and data related to the 2021 OpenCell preprint
Jupyter Notebook
1
star
73

2023-facs-automation-pub

Python libraries for automating cell sorting with the Sony SH800S GUI.
Python
1
star
74

ULC-OD-Meter

Ultra Low Cost Optical Density Meter Repository
C++
1
star
75

napari-iohub

OME-Zarr viewer for napari with iohub as the I/O backend
Python
1
star
76

nf-unsplicedcds

This workflow will find unspliced coding sequences from bams.
Nextflow
1
star