This is the repository that contains the analysis of the lung adenocarcinoma single cell dataset
Getting started
Clone the repo Download the Data_input folder from the link below into the repo: https://drive.google.com/drive/folders/1sDzO0WOD4rnGC7QfTKwdcQTx3L36PFwX?usp=sharing
Scripts
Importing and Creating Seurat Object
01_Import_data_and_metadata.Rmd: Import raw data and metadata. Output of this scipt is saved as "S01_Data_and_metadata.RData".
02_Create_Seurat_object.Rmd: Imports .RData object from script 01. Creates initial Seurat object and performs initial quality control. Final output object is saved as "S02_Main_Seurat_object_filtered.RData".
02.1_Create_Seurat_object_neo_osi.Rmd: Creates a Seurat object of addtional samples (n=5)and performs initial quality control. Final output object is saved as "S02.1_Main_Seurat_object_filtered_neo_osi.RData".
03_Merge_in_NeoOsi.Rmd: Imports the .RData objects generating from 02 and 02.1 and merges them to a single object. Final output object is saved as "S03_Merged_main_filtered_with_neo_osi.RData".
03.1_Subset_and_general_annotations.Rmd: Imports .RData object from script 03. In this script we subset samples to those with greater than 10 cells and perform clustering. The cells are annotated and subset to immune or non-immune cells datasets. Produced objects are "S03_Main_Seurat_object_filtered_and_subset.RData", "S03_Immune_Seurat_object_nodups.RData", and "S03_Nonimmune_Seurat_object.RData".
Immune Compartment Analysis
IM01_Subset_cluster_annotate_immune_cells.Rmd: Imports .RData object from script 03 above that includes only immune cells. In the script cells are clustered and annotated. Object that is produced and saved at the end of the script is called "IM01_Immune_Seurat_object.RData".
IM02_immune_cell_changes_with_response_to_treatment.Rmd: Imports .RData object from script IM01. Within the script we investigate changes in the fraction of immune populations in regard to treatment status across all patients.
IM03_Subset_cluster_annotate_MFs-monocytes_LUNG.Rmd: Subsetting and clustering of all macrophages/monocytes from Lung biopsies followed by treatment stage specific analysis of resulting populations. Output object is called IM03_MFs_Seurat_object.RData.
IM04_Subset_cluster_annotate_T-cells_LUNG.Rmd: Subsetting and clustering of all macrophages/monocytes from Lung biopsies followed by treatment stage specific analysis of resulting populations. Output object is called IM04_Tcells_Seurat_object.RData.
IM05_Immune_cells_across_pats_with_multiple_biopsies.Rmd: Analysis of fractional population changes in patients with multiple biopsies.
Non-Immune Compartment Analysis
NI01_General_annotation_of_nonimmune_cells.Rmd: Imports .RData object from script 03 that included only non-immune cells. In the script cells are clustered and annotated. Objects that is produced and saved at the end of the script is called "NI01_Nonimmune_Seurat_object_annotated.RData".
NI02_epi_subset_and_cluster.Rmd: Imports .RData object from NI01. In this script we subset the cells to only those that are epithelial and re cluster cells. The resulting subset object is saved at the end of the script as "NI02_Epi_Seurat_object_clustered.RData".
NI03_inferCNV.Rmd: Imports .RData object from NI02. Creates the input for InferCNV.
NI03.1_Running_inferCNV_R3_4_4.Rmd: Imports the input files generated from NI03 (stored in inferCNV_nodups in the Data_input folder(see above)). In this script we use InferCNV in R3.4.4 to identify cancer and non-cancer epithelial cells. The cells are annotated and the resulting object is saved at the end of the script as "NI03_epithelial_annotated_tumor.RData".
NI04_Cancer_cells_DEgenes.Rmd: Imports .RData object generated from NI03. In this script we subset the data to cancer cells only and then find the differenitailly expressed genes from three comparisions: 1. TN vs PER, 2. TN vs PD, and 3. PER vs PD. The cancer cell only object is saved as "NI04_tumor_seurat_object.RData".
NI05_Annotation_of_Nontumor_epi.Rmd: Imports .RData object generated from NI03. In this script we subset the data to non-cancer cells only. The non-cancer epithelial cells are then clustered and annotated. The non-cancer epithelial cell object is saved as "NI05_normalepi_seurat_object_annotated.RData".
NI06_mutation_analysis.Rmd: Imports .RData object generated from NI04. In this script we combine outputs from cerebra to a create mutational table.
NI07_TH226_cancercell_analysis.Rmd: Imports .RData object generated from NI04. In this script we subset the data to a single patient with mutliple biopsies and find the differenitailly expressed genes from three comparisions: 1. TN vs PER, 2. TN vs PD, and 3. PER vs PD. We also investigate the expression of five gene expression signatures found within the grouped ananlysis in NI04.
NI08_Gene_expression_plotting.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures found within the grouped ananlysis in NI04.
NI09_AT2_sig_compare.Rmd: Imports .RData objects from NI04 and NI05 as well as data and metadata files in /Data_input/GSE130148_data. In this script we compare cancers cells from each treatment timepoint (TN, PER, PD), as well as non-cancer AT2 cells to an outside dataset of healthy AT2 cells.
NI10_TCGA_clinical_outcomes.Rmd: Imports three input files from /Data_input/TCGA. We compare the five gene expression signatures found within the grouped analysis of NI04 to patient surival outcomes within the TCGA.
NI11_WES_analysis.ipynb: This notebook compares the mutations identified in whole-exome-seq to those identified with scRNA-seq, for the same patient samples.
NI12_msk_analysis.ipynb: This notebook creates figures 2D & E. Takes as input msk_impact_2017/MSK-IMPACT_cosmic_tier1.txt, msk_impact_2017data_clinical_patient_edit.txt and mutation_input/coverage_all_cells_cerebra.csv from Data_input/. Investigates survival outcomes of a large cohort of LAUD patients with similar/different mutational profiles as the ones we identify in our patient samples.
NI13_get_ercc_substitution_rate.py: python script for calculating per-base substitution rates in ERCC standards, for a large group of scRNA-seq bam files.
NI14_qpcr_analysis.Rmd: R code to plot qPCR analysis for cancer cells.
NI15_multiplex_IF_analysis.Rmd: R code to plot Immune IF analysis.
NI16_cancercell_EGFR_ALK.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures found within the subsets of EGFR and ALK samples.
NI17_cancercell_PDsigs.Rmd: Imports .RData object generated from NI04. In this script we investigate the expression of five gene expression signatures, and the comprising genes, found within the samples of PD treatment timepoint samples.
Cerebra
We used the Cerebra tool (https://github.com/czbiohub/cerebra) report mutations and read coverage of regions of interest within genes. Cerebra uses GATK output files (.vcfs) as input and is comprised of several modules which do the following: 1) generate a cell_x_gene mutation-counts matrix, 2) generate a cell_x_ROI summary table that reports amino acid level mutations for a user-defined list of genes, 3) report read coverage (variant vs total reads) to each ROI.