• Stars
    star
    166
  • Rank 227,748 (Top 5 %)
  • Language
  • License
    MIT License
  • Created about 7 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A (continuously updated) collection of references to Hi-C data. Predominantly human/mouse Hi-C data, with replicates.

Hi-C data

MIT License PR's Welcome

A (continuously updated) collection of references to Hi-C data and papers. Predominantly human/mouse Hi-C data, with replicates. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.

Large collections

  • 3DIV - database of uniformly processed 315 Hi-C datasets, 80 human cell/tissue types. Bait-centric (SNP rsID, gene name, hg19 coordinates) visualization of long-range interactions in context of epigenomic (histone, enhancers) signals, numerical results. Custom BWA-MEM pipeline, Bias, distance effect removed. Coordinates of significant interactions, with annotations, are available for (FTP) download, http://kobic.kr/3div/download

    • Yang, Dongchan, Insu Jang, Jinhyuk Choi, Min-Seo Kim, Andrew J Lee, Hyunwoong Kim, Junghyun Eom, Dongsup Kim, Inkyung Jung, and Byungwook Lee. “3DIV: A 3D-Genome Interaction Viewer and Database.” Nucleic Acids Research 46, no. D1 (January 4, 2018)
  • Chorogenome resource: Processed data (Hi-C, ChIP-seq) for Drosophila, Mouse, Human, http://chorogenome.ie-freiburg.mpg.de:5003/

  • GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data - Includes a large collection of standardized processed data from 4D Nucleome. 20 hg38 and 2 mm10 datasets normalized by Yaffe-Tanay method, downloadable, include directionality index, HMM states, TAD analysis results. Text and HDF5 formats. https://www.genomegitar.org/processed-data.html

  • 4DGenome - 3D significant interactions, from different literature sources

  • A catalog of TADs, TAD boundaries (18,972 total, 2,293 novel, Arrowhead and Insulation score), and loops (21,838, HiCCUPS, cooltools call-dots) in human lymphoblastoma cell lines (LCL). Hi-C data on the 1000 genomes individuals (44 different individuals from five super populations), including data from the Human Genome Structural Variation Consortium (HGSVC) and 4DNucleome. The impact of SVs overlapping TAD boundaries on gene expression and splicing. Introduction about 3D genome rewiring events in disease. Juicer, FAN-C, cooltools. GitHub. No data/supplementary until published.

    Paper Li, Chong, Marc Jan Bonder, Sabriya Syed, Human Genome Structural Variation Consortium (HGSVC), HGSVC Functional Analysis Working Group, Michael C. Zody, Mark J.P. Chaisson, et al. “A Comprehensive Catalog of 3D Genome Organization in Diverse Human Genomes Facilitates Understanding of the Impact of Structural Variation on Chromatin Structure.” Preprint. Genomics, May 15, 2023. https://doi.org/10.1101/2023.05.15.540856.

4D Nucleome

  • 4D Nucleome Data Portal - 3D genomics and microscopy data, uniformly processed, integrative visualization in HiGlass, comparative functionality. Browse by type (sequencing, microscopy) or publication. Data are in three tiers: Tier 1 (H1-ESC, GM12878, IMR90, HFF-hTERT (clone 6), and WTC-11), Tier 2 and untiered. Overview of first and second phases of the 4DN project. Other repositories that host Hi-C and similar datasets include the ENCODE portal, NCBI's GEO and EMBL-EBI’s ArrayExpress. 4D Nucleome Browser for integrative and multimodal data navigation.
    • Table 1 - Genomic assay types in the 4D Nucleome Data Portal. Chromatin conformation data (In situ, dilution Hi-C, Micro-C, DNase Hi-C, Hi-C 3.0, Capture Hi-C, TCC, single-cell variants, SPRITE, GAM), and related sequqncing data (ChIA-PET, ChIA-Drop, PLAC-seq, ChIP-seq, CUT&RUN, Repli-seq, MARGI (RNA-chromatin interactions), others).
    • High-resolution Hi-C datasets, over 1 billion read pairs. cooltoolsprocessing, .cool and .mcool formats, A/B compartments and TAD boundaries (insulation score) detected using domain calling pipelines.
    • Microscopy datasets - standard FISH (DNA or RNA), multi-loci FISH, high-throughput FISH, dynamic single particle tracking, ChromEMT, OptoDroplet.
    • Table 2 - All 4D Nucleome analysis pipelines, in CWL, WDL, available on Docker Hub. Alignment with BWA MEM with the -SP5M option. PairsQC - QC report for Hi-C pairs files. Hi-C processing pipeline.
    • 4DN Visualization Workspace Paper Reiff, S.B., Schroeder, A.J., Kırlı, K. et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat Commun 13, 2365 (02 May 2022). https://doi.org/10.1038/s41467-022-29697-4

Lieberman-Aiden lab

All HiC data released by Lieberman-Aiden group. Links to Amazon storage and GEO studies. http://aidenlab.org/data.html

  • Vian, Laura, Aleksandra Pękowska, Suhas S.P. Rao, Kyong-Rim Kieffer-Kwon, Seolkyoung Jung, Laura Baranello, Su-Chen Huang, et al. “The Energetics and Physiological Impact of Cohesin Extrusion.” Cell 173, no. 5 (May 2018) - Architectural stripes, created by extensive loading of cohesin near CTCF anchors, with Nipbl and Rad21 help. Little overlap between B cells and ESCs. Architectural stripes are sites for tumor-inducing TOP2beta DNA breaks. ATP is required for loop extrusion, cohesin translocation, but not required for maintenance, Replication of transcription is not important for loop extrusion. Zebra algorithm for detecting architectural stripes, image analysis, math in Methods. Human lymphoblastoid cells, mouse ESCs, mouse B-cells activated with LPS, CH12 B lymphoma cells, wild-type, treated with hydroxyurea (blocks DNA replication), flavopiridol (blocks transcription, PolII elongation), oligomycin (blocks ATP). Many other data types (e.g., ChIP-seq, ATAC-seq) GSE82144GSE98119

  • Lieberman-Aiden, Erez, Nynke L. van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, et al. “Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.” Science (New York, N.Y.) 326, no. 5950 (October 9, 2009) Gm12878, K562 cells. HindIII, NcoI enzymes. Two-three replicates. GSE18199

  • Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. “A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping.” Cell 159, no. 7 (December 18, 2014) - Human Gm12878, K562, IMR90, NHEC, HeLa cells, Mouse CH12 cells. Different digestion enzymes (HindIII, NcoI, Mbol, DpnII), different dilutions. Up to 35 biological replicates for Gm12878. GSE63525, Supplementary Table S1. Hi-C meta-data

  • Sanborn, Adrian L., Suhas S. P. Rao, Su-Chen Huang, Neva C. Durand, Miriam H. Huntley, Andrew I. Jewett, Ivan D. Bochkov, et al. “Chromatin Extrusion Explains Key Features of Loop and Domain Formation in Wild-Type and Engineered Genomes.” Proceedings of the National Academy of Sciences of the United States of America 112, no. 47 (November 24, 2015). HAP1, derived from chronic myelogenous leukemia cell line. Replicates. GSE74072

  • Rao, Suhas S.P., Su-Chen Huang, Brian Glenn St Hilaire, Jesse M. Engreitz, Elizabeth M. Perez, Kyong-Rim Kieffer-Kwon, Adrian L. Sanborn, et al. “Cohesin Loss Eliminates All Loop Domains.” Cell 171, no. 2 (2017) - HCT-116 human colorectal carcinoma cells. Timecourse, replicates under different conditions. GSE104334

Leonid Mirny lab

http://mirnylab.mit.edu/

Bing Ren lab

http://chromosome.sdsc.edu/mouse/hi-c/download.html

Raw and normalized chromatin interaction matrices and TADs defined with DomainCaller. Mouse ES, cortex, Human ES, IMR90 fibroblasts. Two replicates per condition. GEO accession: GSE35156, GSE43070

Feng Yue lab

Cancer

  • 3D genomics of MYC overexpression. MYC overexpression leads to increased binding at active enhancers, amplified gene expression, increased chromatin interactions, promoter-enhancers, weakened TAD boundaries. U2OS osteosarcoma human cell line with tetracycline-inducible MYC, ChIP-seq (H3K27ac, superenhancer detection), RNA-seq (more downregulated genes, activation of ribosome, translation, motochondrial biogenesis), 4D-seq, and SIQHiC (Spike-in Quantitative Hi-C, mixing in crosslinked mouse 3T3 cells at a ratio 1:4). Replicate data at GSE164777.
    Paper See, Yi Xiang, Kaijing Chen, and Melissa J Fullwood. “MYC Overexpression Leads to Increased Chromatin Interactions at Superenhancers and MYC Binding Sites.” Genome Research, February 3, 2022, gr.276313.121. https://doi.org/10.1101/gr.276313.121.
  • Changes in 3D genome are associated with CNVs in multiple myeloma cells (RPMI-8226 trt- and tetraploid, U266 nearly diploid). The number of TADs increases by ~25%, they become smaller, ~20% switch compartment. ICE normalization better accounts for CNVs than HiCNorm. CNV breakpoints overlap with TAD boundaries. 40kb resolution, replicates. Code, Hi-C, WGS, RNA-seq data GSE87585

  • Curtaxins drugs affect 3D genome by DNA intercalation but without inducing DNA damage, compromise enhancer-promoter interactions, suppress oncogene expression, including MYC family genes, downregulates survival genes, partially disrupt TAD borders, decreases short-range interactions, the level of spatial segregation of the A/B compartments, depletes CTCF but not other factors. Hi-C in HT1080 fibrosarcoma cells. Data: Hi-C and CTCF ChIP-seq in duplicates GSE122463, gene expression in MM1.S and HeLa S3 cells GSE117611, H3K27ac GSE117409, nascent RNA transcription GSE107633

  • 3D genomics of glioblastoma. Replicate samples from three patients. Sub-5kb-resolution Hi-C data, integration with ChIP- and RNA-seq. Data: Six Hi-C replicates, EGAS00001003493, ChIP-seq GSE121601, RNA-seq data EGAS00001003700. Processed data

  • Ten non-replicated Hi-C datasets. Two human lymphoblastoid cell lines with known chromosomal translocations (FY1199 and DD1618), transformed mouse cell line (EKLF), six human brain tumours: five glioblastomas ( GB176, GB180, GB182, GB183 and GB238) and one anaplastic astrocytoma (AA86), a normal human cell line control (GM07017). GSE81879

  • Harewood, Louise, Kamal Kishore, Matthew D. Eldridge, Steven Wingett, Danita Pearson, Stefan Schoenfelder, V. Peter Collins, and Peter Fraser. “Hi-C as a Tool for Precise Detection and Characterisation of Chromosomal Rearrangements and Copy Number Variation in Human Tumours.” Genome Biology 18, no. 1 (December 2017).

  • Prostate cancer, normal. RWPE1 prostate epithelial cells transfected with GFP or ERG oncogene. Two biological and up to four technical replicates. GSE37752

    • Rickman, David S., T. David Soong, Benjamin Moss, Juan Miguel Mosquera, Jan Dlabal, Stéphane Terry, Theresa Y. MacDonald, et al. “Oncogene-Mediated Alterations in Chromatin Conformation.” Proceedings of the National Academy of Sciences of the United States of America 109, no. 23 (June 5, 2012)
  • Taberlay, Phillippa C., Joanna Achinger-Kawecka, Aaron T. L. Lun, Fabian A. Buske, Kenneth Sabir, Cathryn M. Gould, Elena Zotenko, et al. “Three-Dimensional Disorganization of the Cancer Genome Occurs Coincident with Long-Range Genetic and Epigenetic Alterations.” Genome Research 26, no. 6 (June 2016)

  • Cancer, normal Hi-C. Prostate epithelial cells, PC3, LNCaP. Two-three replicates. GSE73785

  • Haplotype-resolved Hi-C of GM12878, integrated with RNA-seq and Bru-seq (nascent mRNA). Investigation of Monoallelic expression (MAE) and Allele-Biased expression (ABE). GEO GSE159813

BRCA

  • Nucleosome reorganization in breast cancer vs. normal tissues (MNase-seq, MNase-H3-seq), along with cfDNA from blood. Four patients. Data processing with cfDNAtools, NucTools. Nucleosomes gained in BRCA are strongly enriched (20X) in CpG islands, in promoters of DNA-binding proteins, cancer pathways. Average distance between nucleosomes (Nucleosome repeat length NRL) decreases (5-10bp). These effects are associated with differential DNA methylation and binding of linker histone variants H1.4 and H1X.
    Paper Jacob, Divya R., Wilfried M. Guiblet, Hulkar Mamayusupova, Mariya Shtumpf, Luminita Ruje, Isabella Ciuta, Svetlana Gretton, et al. “Nucleosome Reorganisation in Breast Cancer Tissues.” Preprint. Genomics, April 18, 2023. https://doi.org/10.1101/2023.04.17.537031.
  • Comparative characterization of 3D genomics in TNBC. Cell lines (HMEC as normal and 5 BRCA subtypes, by the order of aggressiveness: T47D, ZR7530, HCC1954, HCC70, BT549). TNBC shows most dramatic changes, partially conserved across TNBC cell lines and TNBC tissues. TADs (CaTCH), loops (HiCCUPS), compartment (PC1) analyses. Local interactions are lost, "normal" TAD interactions weakened but TNBC TADs strenghtened; those changes are associated with CTCF loss/gain. 3D changes are associated with gene expression changes. Hi-C (replicates), ChIP-seq (CTCF, H3K27ac), RNA-seq, and ATAC-seq data are at GSE167154.
    Paper Kim, Taemook, Sungwook Han, Yujin Chun, Hyeokjun Yang, Hyesung Min, Sook Young Jeon, Jang-il Kim, Hyeong-Gon Moon, and Daeyoup Lee. “Comparative Characterization of 3D Chromatin Organization in Triple-Negative Breast Cancers.” Experimental & Molecular Medicine, May 5, 2022. https://doi.org/10.1038/s12276-022-00768-2.
  • 3D spheroids (organoids) of three breast normal (MCF10A) and cancer cells (MCF7 and MCF7TR tamoxifen-resistant). Hi-C, RNA-seq, validation using 3D-qPCR, 3D-FISH. Normalization using HiCcompare's idea, TADs using TopDom, TAD comparison using eight types of changes, significant interactions using HiSIF. P1D1 loop definition as loops contacting promoter and distal regions of the same gene, comparison of strength change using Valid Pairs Per Million (VPPM), defining differentially expressed looping genes (DELGs). Hi-C (replicates) and RNA-seq (triplicates) at GSE165572.
    Paper Li, Jingwei, Kun Fang, Lavanya Choppavarapu, Ke Yang, Yini Yang, Junbai Wang, Ruifeng Cao, Ismail Jatoi, and Victor X. Jin. “Hi-C Profiling of Cancer Spheroids Identifies 3D-Growth-Specific Chromatin Interactions in Breast Cancer Endocrine Resistance.” Clinical Epigenetics 13, no. 1 (December 2021): 175. https://doi.org/10.1186/s13148-021-01167-6.
  • BRCA gene targets regulated by SNPs - Capture-C of chromatin interactions centered on causal variants and promoters of causal genes (Variant- and Promoter Capture Hi-C) in six human mammary epithelial (B80T5, MCF10A) and breast cancer (MCF7, T47D, MDAMB231, Hs578T) cell lines. HindIII fragments, CHiCAGO and Peaky for significant interaction calling. PCA on interactions separates cell types, significant interactions enriched in epigenomic elements. 651 target genes at 139 independent breast cancer risk signals. Table 1 - top priority target genes. HiCUP-processed capture Hi-C data (hg19), code, Supplementary tables, Tables S11 - 651 target genes.
    Paper Beesley, Jonathan, Haran Sivakumaran, Mahdi Moradi Marjaneh, Luize G. Lima, Kristine M. Hillman, Susanne Kaufmann, Natasha Tuano, et al. “Chromatin Interactome Mapping at 139 Independent Breast Cancer Risk Signals.” Genome Biology 21, no. 1 (December 2020) https://doi.org/10.1186/s13059-019-1877-y
  • Hi-C and RNA-seq in two ERα+ parental and Tamoxifen-resistant (TR) MCF7 and T47D cells, before and after treatment with Sapitinib (AZD8931), a dual TKI of EGFR/HER2. Eight types of TAD changes (TopDom), significant loops using Homer, promoter-distal looping genes (P1D1, P1D2). Many TR-specific TADs and loops are reversible upon Sapitinib treatment. ERα-bound promoter-enhancer looping genes enclosed within altered domains are enriched with genes with functions and pathways associated with cancer aggressiveness, glycolysis and metabolism, and focal adhesion. Comparing cells and spheroids - the latter recapitulate most changes and better preclinical model. hg19, 40kb. Replicated Hi-C and triplicated RNA-seq of MCF7/T47D parental/TamR at GSE144380 and GSE128676.
    Paper Yang, Yini, Lavanya Choppavarapu, Kun Fang, Alireza S. Naeini, Bakhtiyor Nosirov, Jingwei Li, Ke Yang, et al. “The 3D Genomic Landscape of Differential Response to EGFR/HER2 Inhibition in Endocrine-Resistant Breast Cancer Cells.” Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1863, no. 11 (November 2020): 194631. https://doi.org/10.1016/j.bbagrm.2020.194631.
  • 3D (tethered chromatin conformation, TCC) timecourse of estradiol (E2) simulation in ER+ BRCA and endocrine resistance. Hormone-starved MCF7 (T0), E2-treated for 1h (T1), replicates combined. Approximately similar number of compartments (2050). Dynamic A/B compartments (HiCLib) are associated with active open chromatin. Dynamic changes are characterized by decreased CTCF binding. Associated genes enriched with cancer invasion, aggressiveness, metabolism. Three additional timepoints, 4h, 16h, 24h (T4, T16, T24). 24 patterns of changes, categorized into six (similar to TADcompare, highly common HCC, early/late transit ETC/LTC, lowly/moderately/highly dynamic LDC/MDC/HDC). Epigenetic states from histone ChIP-seq ChromHMM. Public RNA-seq data. Tamoxifen-resistant MCF7-TamR and T47D-TamR cell lines, tamoxifen-resistant altered compartments (TRACs), six types classified into shrunk, expanded, and flipped compartments. HOMER-identified loops. Differential genes associated with ribosome, tight junction, endocytosis, lysosome, cell cycle, WNT signaling pathway, insulin signaling pathway, focal adhesion, and MAPK signaling pathways. Molecular mechanistic model in Discussion. Supplementary data with hg19 coordinates of compartments, genes, loops. GSE108787 - MCF7 and TamR TCC, ChIP-seq and RNA-seq timecourse data (plus public RNA-seq); GSE119890 - T47D and TamR TCC timecourse data.
    Paper Zhou, Yufan, Diana L. Gerrard, Junbai Wang, Tian Li, Yini Yang, Andrew J. Fritz, Mahitha Rajendran, et al. “Temporal Dynamic Reorganization of 3D Chromatin Architecture in Hormone-Induced Breast Cancer and Endocrine Resistance.” Nature Communications 10, no. 1 (December 2019): 1522. https://doi.org/10.1038/s41467-019-09320-9.
  • Capture Hi-C (CHi-C) to annotate 63 breast cancer risk loci. 110 target genes at 33 loci, supported bu other evidence (eQTLs, disease-specific survival). Two ER+ breast cancer cell lines (T-47D, ZR-75-1), two ER− breast cancer cell lines (BT-20, MDA- MB-231), one “normal” breast epithelial cell line (Bre80-Q-TERT (Bre80)) and a non-breast lymphoblastoid cell line (GM06990). Approx 40% of interaction peaks are present in multiple cell lines. More interactions within TADs. WashU session with all CHi-C interaction peaks. Table 2 Risk loci which formed interaction peaks directly (N = 33) or via an adjacent risk locus (N = 3) with 110 target genes (locus, SNP, gene targets, nearest gene). Table 3 Nine CHi-C putative target genes that were statistically significant eQTLs (FDR adjusted P < 0.1) (locus, SNP, gene, p-values in all, ER+/- cancers). Table 4 Six CHi-C putative target genes for which there was orthogonal support for at least two additional data sources. PRJEB23968 - FASTQ files.
    Supplementary material https://www.nature.com/articles/s41467-018-03411-9#Sec23 - Supplementary Data 1: Captured genomic regions (Locus, SNP, hg19 coordinates, size, reference) - Supplementary Data 2: Numbers of statistically significant interaction peaks in six cell lines at 51 informative loci and 12 uninformative loci - Supplementary Data 3: Coordinates of interacting pairs detected in at least two cell lines (bedpe, -log10 FDR of interaction significance, cell line, numbed of cells) - Supplementary Data 4: Risk loci which formed interaction peaks with target genes in T-47D (T), ZR-75-1 (Z), Bre80 (Br), BT-20 (BT), MDA-MB-231 (M) and GM06990 (G) cell lines. (cytoband, SNP, gene targets). - Supplementary Data 5: Distances between published risk SNPs and putative CHi-C target genes (kb) at 36 informative risk loci (cytoband, SNP, hg19 coordinates, gene targets) - Supplementary Data 6: eQTL analysis of 69 protein coding target genes at 26 risk loci in TCGA breast cancer data - Supplementary Data 7: Disease-specific survival analysis of 97 target genes in Metabric data
    Paper Baxter, Joseph S., Olivia C. Leavy, Nicola H. Dryden, Sarah Maguire, Nichola Johnson, Vita Fedele, Nikiana Simigdala, et al. “Capture Hi-C Identifies Putative Target Genes at 33 Breast Cancer Risk Loci.” Nature Communications 9, no. 1 (December 2018): 1028. https://doi.org/10.1038/s41467-018-03411-9

Tissue-specific

ENCODE

Search query for any type of Hi-C data, e.g., human brain Hi-C

Brain

Cell lines

  • Haarhuis, Judith H.I., Robin H. van der Weide, Vincent A. Blomen, J. Omar Yáñez-Cuna, Mario Amendola, Marjon S. van Ruiten, Peter H.L. Krijger, et al. “The Cohesin Release Factor WAPL Restricts Chromatin Loop Extension.” Cell, (May 2017) - WAPL, cohesin's antagonist, DNA release factor, restricts loop length and prevents looping between incorrectly oriented CTCF sites. Together with SCC2/SCC4 complex, WAPL promotes correct assembly of chromosomal structures. WAPL WT and KO Hi-C, RNA-seq, ChIP-seq for CTCF and SMC1. Also, SCC4 KO and combined SCC4-WAPL KO Hi-C. Potential role of WAPL in mitosis chromosome condensation. Tools: HiC-Pro processing, HICCUPS, HiCseq, DI, SomaticSniper for variant calling. Data (Hi-C in custom paired BED format) : GEO GSE95015

  • Grubert, Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek, Alicia R. Martin, Peyton Greenside, et al. “Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions.” Cell, (August 2015) - seven Hi-C replicates on Gm12878 cell line, GEO GSE62742

  • Naumova, Natalia, Maxim Imakaev, Geoffrey Fudenberg, Ye Zhan, Bryan R. Lajoie, Leonid A. Mirny, and Job Dekker. “Organization of the Mitotic Chromosome.” Science (New York, N.Y.), (November 22, 2013) - E-MTAB-1948 - 5C and Hi-C chromosome conformation capture study on metaphase chromosomes from human HeLa, HFF1 and K562 cell lines across the cell cycle. Two biological and two technical replicates. ArrayExpress E-MTAB-1948

  • Jessica Zuin et al., “Cohesin and CTCF Differentially Affect Chromatin Architecture and Gene Expression in Human Cells,” Proceedings of the National Academy of Sciences of the United States of America, (January 21, 2014) - CTCF and cohesin (RAD21 protein) are enriched in TAD boundaries. Depletion experiments. Different effect on inter- and intradomain interactions. Loss of cohesin leads to loss of local interactions, but TADs remained. Loss of CTCF leads to both loss of local and increase in inter-domain interactions. Different gene expression changes. TAD structures remain largely intact. Data: Hi-C, RNA-seq, RAD21 ChIP-seq for control and depleted RAD21 and CTCF in HEK293 hepatocytes. Two replicates in each condition. GEO GSE44267

Non-human data

  • The effect of somatic chromosome pairing on 3D genome organization. Drosophila, in-situ Hi-C data, HiC-Pro, Juicer processing. Investigation of the effect of paiting on gene loops mediated by RNAPII and Polycomb-mediated loops. Maintenance of A/B compartments is independent from looping. Anti-pairing CAP-H2-condensin II complex interacts with the zinc-finger protein Z4, under hyperosmotic cellular stress. Informative introduction about the 3D Drosophila genome. ChIP-seq, Hi-C, and ATAC-seq data (dm6 assembly) on GEO GSE213553. Other data in the "Data availability" section.
    Paper Puerto, Marta, Mamta Shukla, Paula Bujosa, Juan Perez-Roldan, Srividya Tamirisa, Carme Solé, Eulàlia de Nadal, Francesc Posas, Fernando Azorin, and M. Jordan Rowley. “Somatic Chromosome Pairing Has a Determinant Impact on 3D Chromatin Organization.” Preprint. Genomics, March 30, 2023. https://doi.org/10.1101/2023.03.29.534693.
  • Evolutionary 3D genomics, principles of chromosome folding in mammals (Eutherians (aardvark, elephant, mouse, human), marsupials (chicken, platypus, wallaby, tasmanian devil)). Reshuffling can influence high-order chromatin organization. Eutherian genome organization is associated with higher number of short loops (Hi-C), high CTCF density (ChIP-seq), chromosomal territories. Vice versa for marsupials, including chromosomes in the Rabl configuration. A/B compartments, TADs have similar properties. Analysis of synthenic region rearrangements, reconstructing evolutionary history. Juicer, TADbit, FAN-C, Newly generated data for African Elephant, Aardvark, Tasmanian Devil, Tammar Wallaby (Hi-C, CTCT, H3K4me3 ChIP-seq, RNA-seq) at GSE206075.
    Paper Álvarez-González, Lucía, Cristina Arias-Sardá, Laia Montes-Espuña, Laia Marín-Gual, Covadonga Vara, Nicholas C. Lister, Yasmina Cuartero, et al. “Principles of 3D Chromosome Folding and Evolutionary Genome Reshuffling in Mammals.” Cell Reports 41, no. 12 (December 2022): 111839. https://doi.org/10.1016/j.celrep.2022.111839.
  • Erythrocytes 3D genome organization in ten species at the last nucleated stages of maturation (newly generated mouse erythroblasts data and previously generated public blood Hi-C data from other organisms). Lack loops and TADs, strong second diagonal pattern. Raw data at SRA.
    Paper Ryzhkova, Anastasia, Alena Taskina, Anna Khabarova, Veniamin Fishman, and Nariman Battulin. “Erythrocytes 3D Genome Organization in Vertebrates.” Scientific Reports 11, no. 1 (December 2021): 4414. https://doi.org/10.1038/s41598-021-83903-9.
  • Investigation of the mechanisms of TAD boundaries in Drosophila. Notch gene locus having two TADs, the role of genetic sequences bound by architectural proteins (APs, CP190, BEAF-32, M1BP, SuHw, CTCF). Deletion (CRISPR-Cas9) of domains lead to fusion of TADs, loss of APs, disruption of transcription. In nucleus Hi-C (4-cutter MboI) in embryonic cell line S2R+ in triplicates GSE136137. References to many Drosophila public datasets in Methods section.
    Paper Arzate-Mejía, Rodrigo G., Angel Josué Cerecedo-Castillo, Georgina Guerrero, Mayra Furlan-Magaril, and Félix Recillas-Targa. “In Situ Dissection of Domain Boundaries Affect Genome Topology and Gene Transcription in Drosophila.” Nature Communications 11, no. 1 (December 2020): 894. https://doi.org/10.1038/s41467-020-14651-z.
  • RNA-seq, ATAC-seq, ChIP-seq, whole genome methylation (30X), Hi-C in 11 adult and two embryonic tissues on zebrafish. Comparison with human and mouse regulatory elements. Enrichment of evolutionary breakpoints at TAD boundaries, H3K4me3 and CCTF signal.De novo chr4 assembly (sex chromosome). scATAC-seq on zebrafish brain - 25 cell types. GEO GSE134055, Tweet

  • tagHi-C protocol for low-input tagmentation-based Hi-C. Applied to mouse hematopoiesis 10 major blood cell types. Changes in compartments and the Rabl configuration defining chromatin condensation. Gene-body-associating domains are a general property of highly-expressed genes. Spatial chromatin loops link GWAS SNPs to candidate blood-phenotype genes. HiC-Pro to Juicer. GEO GSE142216 - RNA-seq, replicates, GEO GSE152918 - tagHi-C data, replicates, combined .hic files

  • Single-nucleus Hi-C data (scHi-C) of 88 Drosophila BG3 cells. 2-5M paired-end reads per cell, 10kb resolution. ORBITA pipeline to eliminate the effect of Phi29 DNA polymerase template switching. Chromatin compartments approx. 1Mb in size, non-hierarchical conserved TADs can be detected. Lots of biology, integration with other omics data. Raw and processed data in .cool format at GEO GSE131811

  • 3D chromatin organization during spermatogenesis, mouse. Meyotic chromosomes in prophase have weak compartmentalization, TADs, loops. Enrichment in near inter-chromosomal interactions (close to diagonal). The X chromosome lacks domain organization during meiotic sex-chromosome inactivation. Concept and formula for evaluation of genomic compartment strength (Methods). GEO - Hi-C of meiotic pachytene spermatocytes (PS; 2 biological replicates). Other public Hi-C, RNA-seq, ChIP-seq data.

  • 3D genome rearrangement is uncoupled from gene expression changes. Introduction, references for and against 3D genomics-gene expression links. Drosophila, a "balancer" line with highly rearranged chromosomes. Negligible association can be detected, but changes in genome topology are not predictive of changes in gene expression, loss of long-range interactions has little impact. Processed data, GitHub. Raw data: Whole genome, Hi-C, Capture-C, RNA-seq

    Paper

    Ghavi-Helm, Yad, Aleksander Jankowski, Sascha Meiers, Rebecca R. Viales, Jan O. Korbel, and Eileen E. M. Furlong. “Highly Rearranged Chromosomes Reveal Uncoupling between Genome Topology and Gene Expression.” Nature Genetics, July 15, 2019.

  • Global organization of the B cell genome throughout differentiation by the transcription factor Pax5. Mouse splenic CD4+ cells, B cells at various differentiation stages, granulocytes. diffHiC, TADbit, directionality index. Hi-C and RNA-seq data on GEO GSE99163.
    Paper Johanson, Timothy M. “Transcription-Factor-Mediated Supervision of Global Genome Architecture Maintains B Cell Identity.” Nature Immunology 19 (2018): 14. https://doi.org/10.1038/s41590-018-0234-8
  • TADs in Drosophila, Hi-C and RNA-seq in four cell lines of various origin. dCTCF, SMC3, and Su(Hw) are weakly enriched at TAD boundaries. Transcription and active chromatin (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H4K16ac) are associated with TAD boundaries. Also, BEAF-32 and CP190. Hierarchical TADs. Housekeeping genes tend to be near TAD boundaries and in inter-TAD regions. TAD boundary prediction using regression, modeling to associate TADs with bands, investigation of the hierarchy. Heavy use of the Armatus TAD caller. RNA-seq and replicate Hi-C data, high correlation, merged into 20kb resolution.  GEO GSE69013

  • Hi-C of polytene chromosomes in Drosophila. Polytene bands colocalize with TADs. TADs are conserved between polytene and diploid cells. Loops are transient. Two states of folding: Fully extended and up to 10-fold compacted fibers constitute euchromatin. Up to 30-fold compacted fibers represent heterochromatin of the nuclear periphery. Many experimental observations, validations. GEO - Tethered and in-solution Hi-C, triplicates, polytene, diploid.

    Paper Eagen, Kyle P., Tom A. Hartl, and Roger D. Kornberg. “Stable Chromosome Condensation Revealed by Chromosome Conformation Capture.” Cell 163, no. 4 (November 2015): 934–46. https://doi.org/10.1016/j.cell.2015.10.026.

Differential Hi-C

  • Liquid-liquid phase separation (LLPS) in haematological cancers is associated with intrinsically disordered regions (IDRs) of NUP98-HOXA TF chimera and induces CTCF-independent chromatin loops enriched in proto-oncogenes. Many biochemical assays, imaging, mass-spec, ChIP-seq, RNA-seq. All data at GEO GSE144643. In situ Hi-C (HEK293FT kidney cells, IDR wild type and mutated, biological and technical replicates) at GEO GSE143465.
    Paper Ahn, Jeong Hyun, Eric S. Davis, Timothy A. Daugird, Shuai Zhao, Ivana Yoseli Quiroga, Hidetaka Uryu, Jie Li, et al. “Phase Separation Drives Aberrant Chromatin Looping and Cancer Development.” Nature, June 23, 2021. https://doi.org/10.1038/s41586-021-03662-5.
  • WIZ (widely interspaced zinc finger-containing protein) - new loop-organizing protein, colocalizes with CTCF and cohesin across the genome. Loss of WIZ increases cohesin occupancy and DNA loops. WIZ maintains proper gene expression and stem cell identity. Arima, Juicer. GEO GSE137285 - RNA-seq, ChIP-seq, Hi-C replicates in WT and WIZdel mouse ESCs.

  • 3D chromatin reorganization during different types of cellular senescence, replicative (RS) and oncogene-induced (OIS over time course). Senescence-associated heterochromatin loci (SAHFs), formed with the help of DNMT1 via regulation of MMGA2 expression. WI38 primary fibroblasts. OIS - gain in long-range contacts. diffHiC analysis, differential regions enriched in H3K9me3. TADkit for 3D modeling, visualization. Data (Hi-C replicates, different conditions, timecourse, H3K4me3/H3K9me3/H3K27ac ChIP-seq, RNA-seq) GEO GSE130306

  • X chromosome sex differences in Drosophila. Male X chromosome has two-fold upregulation of gene expression, more mid/long-range interactions, weaker boundaries marked by BEAF-32, CP190, Chromator, and CLAMP, a dosage compensation complex cofactor. Less negative slope in distance-dependent decay of interactions, less clustered top scoring interactions (more randomness), more open structure overall. Local score differentiator (LSD-score) to call differential TAD boundaries in CNV-independent manner - more non-matching boundaries than autosomes, ~20% appearing and ~35% disappearing boundaries. Enrichment in epigenomic marks identified stronger boundary association with MSL (male-specific lethal complex) and CLAMP binding. Many other experimental observations. hiclib, hicpipe processing. R implementation of LSD differential TAD analysis, Hi-C data in bedGraph format GEO GSE94115, Tweet

  • Hi-C TAD comparison between normal prostate cells (RWPE1) and two prostate cancer cells (C42B, 22Rv1). TADs (TopDom-called) become smaller in cancer, switch epigenetic states. FOXA1 promoter has more loop anchors in cancer. Androgen receptor (AR) locus has chromatin structure changed around it (Figure 6). Loop investigation called with Fit-HiC, motifs (NOMe-seq) enriched in loop-associated enhancers different between normal and cancer. HiTC visualization. Figure 1a, Supplementary Figure 3, 5 - examples/coordinates of TAD boundary/length changes.

  • Data For RWPE1, C42B, 22Rv1 cell lines: GEO GSE118629. In situ Hi-C, 4-cutter MboI,  replicated, text-based sparse matrices at 10kb and 40kb resolution, raw and ICE-normalized, hg19. H3K9me3, H3K27me3, H3K36me3, RNA-seq.

  • Supplementary data: Data 2 - TAD coordinates and annotations; Data 3 - differentially expressed genes in smaller TADs; Data 4 - gene expression changes in TADs switching epigenomic state; Data 5 - enhancer-promoter loops; Data 6 - coordinates of nucleosome-depleted regions; Data 7 - all differentially expressed genes; Data 8 - target genes of FOXA1-bound enhancers; Data 9 - overexpressed genes with more enhancer-promoter loops

  • DNA methylation linked with 3D genomics. Methylation directs PRC-dependent 3D organization of mouse ESCs. Hypomethylation in mouse ESCs driven to naive pluripotency in two inhibitors (2i) is accopmanied by redistribution of polycomb H3K27me3 mark and decompaction of chromatin. Focus on HoxC, HoxD loci. Hi-C data processed with distiller and other cool-related tools. RNA-seq, H3K37me3 ChIPseq of Mouse ESCs grown in serum and 2i conditions. Hi-C data in replicates GEO GSE124342

  • RNA transcription inhibition minimally affects TADs, weakens TAD boundaries. K562, RNAse inhibition before/after crosslinking (bXL/aXL), actinomycin D (complete transcriptional arrest) treatment. Processing using cword, 40kb resolution. Data with replicates of each condition, GEO GSE114337

  • Comparison of the 3D structure of human and chimpanzee induced puripotent stem cells. Lower-order pairwise interactions are relatively conserved, but higher-order, such as TADs, differ. HiCUP and HOMER for Hi-C data processing to 10kb resolution. cyclic loess normalization, limma for significant interaction definition, Arrowhead on combined replicated wot detect TADs.  Association of differential chromatin interactions with gene expression. PyGenomeTracks for visualization. Workflowr code, Processed Hi-C data (4 human and 4 chimp iPSCs) GEO GSE122520

  • In situ HiC libraries in biological replicates (n=2) for several hematopoietic celltypes (200mio reads per replicate) with a focus on the B cell lineage in mice. The authors investigate the role of the transcription factor Pax5 towards its supervisiory role of organizing the 3D genome architecture throughout B cell differentiation. The raw data are available via GEO GSE99151

  • DNA loop changes during macrophage development (THP-1 monocyte to macrophage development under 72h PMA treatment). In situ Hi-C (pbn reads, 10kb resolution), RNA-seq, ATAC-seq, CTCF and H3K27ac ChIP-seq. Formation of multi-hubs at key macrophage genes. Differential (dynamic, DESeq2-detected) loops are enriched for AP-1, more enriched in H3K27ac, in contrast to static loops. Association between local H3K27ac and transcription level with distal DNA elements with elevated H3K27ac. Very few genes and lower H3K27ac signal in lost loops, more genes and H3K27ac signal in gained loops. Fold changes in H3K27ac signal positively correlate with DNA looping. Macrophage development-specific gene ontology enrichments. Network analysis for multi-loop multi-enhancer activation hubs identification. GEO GSE96800 ChIP-seq, ATAC-seq, RNA-seq, Two Hi-C samples, THP-1 PMA-treated and untreated, SRA PRJNA385337.

    • Supplemental material:
      • Table S1. DNA Loops in Untreated THP-1 Cells, 16067. Text, hg19 genomic coordinates, columns: anchor1_chrom anchor1_start anchor1_end anchor2_chrom anchor2_start anchor2_end sample -log10(P) anchor1_strand anchor2_strand
      • Table S2. DNA Loops in PMA-Treated THP-1 Cells, 16335.
      • Table S3. Differential Loops
    • Phanstiel, Douglas H., Kevin Van Bortle, Damek Spacek, Gaelen T. Hess, Muhammad Saad Shamim, Ido Machol, Michael I. Love, Erez Lieberman Aiden, Michael C. Bassik, and Michael P. Snyder. “Static and Dynamic DNA Loops Form AP-1-Bound Activation Hubs during Macrophage Development.” Molecular Cell, (September 2017)

Timecourse Hi-C

  • 3D genomics of human embryogenesis. Human and mouse sperm differ, human don't have TADs and A/B compartments, they establish later in embryogenesis, require zygotic genome activation and CTCF. Six stages of spatiotemporal Hi-C during human embryogenesis including sperm, 2-cell, 8-cell, morula, blastocysts, and six-week-old embryos. GitHub. Data: CRA000852, CRA000108, CRA000852.
    Paper Chen, Xuepeng, Yuwen Ke, Keliang Wu, Han Zhao, Yaoyu Sun, Lei Gao, Zhenbo Liu, et al. “Key Role for CTCF in Establishing Chromatin Structure in Human Embryos.” Nature, December 4, 2019. https://doi.org/10.1038/s41586-019-1812-0.
  • Vara, Covadonga, Andreu Paytuví-Gallart, Yasmina Cuartero, François Le Dily, Francisca Garcia, Judit Salvà-Castro, Laura Gómez-H, et al. “Three-Dimensional Genomic Structure and Cohesin Occupancy Correlate with Transcriptional Activity during Spermatogenesis.” Cell Reports, (July 2019) - 3D structure changes during spermatogenesis in mouse. Hi-C, RNA-seq, CTCF/REC8/RAD21L ChIP-seq. Description of biology of each stage (Fibroblasts, spermatogonia, leptonema/zygonema, pachynema/diplonema, round spermatids, sperm), and A/B compartment and TAD analysis (TADbit, insulation score), data normalized with ICE. Integration with differential expression. Changes in distribution of CTCF and cohesins (REC8 and RAD21L). Key tools: BBDuk (BBMap), TADbit, HiCExplorer, HiCRep, DeepTools. Data (no replicates) GEO GSE132054

  • Paulsen, Jonas, Tharvesh M. Liyakat Ali, Maxim Nekrasov, Erwan Delbarre, Marie-Odile Baudement, Sebastian Kurscheid, David Tremethick, and Philippe Collas. “Long-Range Interactions between Topologically Associating Domains Shape the Four-Dimensional Genome during Differentiation.” Nature Genetics, April 22, 2019 - Long-range TAD-TAD interactions form cliques (>3 TAD interacting) are enriched in B compartments and LADs, downregulated gene expression. Graph representation of TAD interactions. Quantifying statistical significance of between-TAD interactions. TAD boundaries are conserved. TAD cliques are dynamic. Permutation test preserving distances. Armatus for TAD detection. hiclib for data processing, Juicebox for visualization. Data: Time course differentiation or human adipose stem cells (day 0, 1, and 3). Hi-C (two replicates), Lamin B1 ChIP-seq, H3K9me3. GEO GSE109924. Also used mouse ES differentiation (Bonev 2017), mouse B cell reprogramming (Stadhouders 2018), scHi-C (Nagano 2017)

  • Du, Zhenhai, Hui Zheng, Bo Huang, Rui Ma, Jingyi Wu, Xianglin Zhang, Jing He, et al. “Allelic Reprogramming of 3D Chromatin Architecture during Early Mammalian Development.” Nature, (12 2017) - Developmental time course Hi-C. Data in preimplantation embryos at the following stages: gametes (sperm and MII oocyte), pronuclear stage 5 (PN5) zygotes, early 2-cell, late 2-cell, 8-cell, inner cell masses (ICM), and mouse embryonic stem cells (mES). Low-input Hi-C technology (sisHi-C). TADs are initially absent, then gradually appeared. HiCPro mapping, Pearson correlation on low-resolution matrices, allele resolving. Data:  GEO GSE82185

  • Hug, Clemens B., Alexis G. Grimaldi, Kai Kruse, and Juan M. Vaquerizas. “Chromatin Architecture Emerges during Zygotic Genome Activation Independent of Transcription.” Cell, (06 2017) - TADs appearing during zygotic genome activation, independent of transcription. TAD boundaries are enriched in housekeeping genes, colocalize in 3D. Drosophila. Insulation score for boundary detection. Overlap analysis of TAD boundaries. Processed Hi-C matrices at 5kb resolution (replicates merged, .cool format) and TAD boundaries at nuclear cycle 12, 13, 14, and 3-4 hours post fertilization

  • Ke, Yuwen, Yanan Xu, Xuepeng Chen, Songjie Feng, Zhenbo Liu, Yaoyu Sun, Xuelong Yao, et al. “3D Chromatin Structures of Mature Gametes and Structural Reprogramming during Mammalian Embryogenesis.” Cell, (July 13, 2017) - 3D timecourse changes during mouse gametes (sperm and MII oocyte) and early embryos development, from zygotic (no TADs, many long-range interactions) to 2-, 4-, 8-cell, blastocyst and E7.5 mature embryos (TADs established after several rounds of DNA replication). A/B compartments associated with un/methylatied CpGs, respectively. PC1, directionality index, insulation score to define compartments and TADs, these metrics increase in magnitude/strength during maturation. Enrichment in CTCF, SMC1, H3K4me3, H3K27ac, H3K9ac, H3K4me1, depletion in H3K9me3, H3K36me3, H3K27me3. The compartment strength is weaker in maternal vs. paternal genomes. Covariance for each gene vs. boundary score across the timecourse. Relative TAD intensity changes. Hi-C and RNA-seq data at different stages, some replicates

Promoter-capture Hi-C

  • SIPs, super-interactive promoters in five hematopoietic cell types (Erythrocyte, Macrophage/monophage, megakaryocyte, naive CD4 T-cells, Neutrophils). Reanalysis of promoter-capture Hi-C data from Javierre et al., “Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters.” study. CHiCAGO pipeline. Promoter-interacting regions (PIRs) interacting with SIPs are more enriched in cell type-specific ATAC-seq peaks, GWAS variants for relevant cell types. SIP-associated genes are higher expressed in relevant cells. Some SIPs are shared across cell lines. Super-SIPs.

  • Genome-wide maps linking disease variants to genes. Activity-By-Contact (ABC) Model. 72 diseases and complex traits (non-specific, no psychiatric), linking 5046 fine-mapped GWAS signals to 2249 genes. 577 genes influence multiple phenotypes. Nearly half enhancers regulate multiple genes.Table S7 - Summary of diseases and traits.Table S9 - ABC-Max predictions for 72 diseases and complex traits.

  • Promoter-enhancer contacts occur in cohesin-dependent and cohesin-independent manner. Promoter Capture Hi-C on degradation of cohesin (SCC1 subunit) and CTCF (both targeted by auxin-inducible degron and mEGFP reporter) in G1-synchronized HeLa cells. The majority of promoter contacts are lost (associated with transcriptional changes, SLAM-seq) but some are retained and gained. Cohesin-independent promoter contacts interact with active enhancers. Cohesin-dependent interactions are typically longer and associated with CTCF, while cohesin-independent interactions are shorter and associated with active promoters and enhancers. HiCUP, CHiCAGO, Chicdiff. Processed data, replicates of promoter-capture Hi-C data GEO GSE145735, replicates of SLAM-seq data GEO GSE145734

  • Promoter-enhancer predictions in 131 cell types and tissues using the Activity-By-Contact (ABC) Model, based on chromatin state (ATAC-seq) and 3D folding (consensus Hi-C). ABC model assumes an element’s quantitative effect on a gene should depend on its strength as an enhancer (Activity) weighted by how often it comes into 3D contact with the promoter of the gene (Contact), and that the relative contribution of an element on a gene’s expression (as assayed by the proportional decrease in expression following CRISPR-inhibition) should depend on that element’s effect divided by the total effect of all elements. Outperforms distance-based methods, 3D-based only, machine learning approaches. Enhancer-promoter predictions for GM12878, K562, liver, LNCAP, mESCs, NCCIT cells, more at Engreitz Lab page. GitHub repository broadinstitute/ABC-Enhancer-Gene-Prediction.

  • Promoter-enhancer interactions. Promoter-capture Hi-C, 27 human cell lines. Well-formatted data and hg19 genomic coordinates Supplementary material and http://www.3div.kr/capture_hic

  • Promoter capture Hi-C in 17 blood cell types. Chromatin interactions are cell type-specific. >50% interactions are one-to-one. Enriched in H3K27ac and H3K4me1 (active enhancers). GWAS loci enriched in PIRs. Table S3 lists prioritized genes/SNPs, for autoimmune diseases. Used CHiCAGO to identify strongly interacting regions. Data has active promoter-enhancer links. More than 2,500 potential disease-associated genes are linked to GWAS SNPs. https://osf.io/u8tzp/

Single-cell Hi-C

See Notes on single-cell Hi-C technologies, tools, and data repository

Micro-C

See the Micro-C section in the HiC_tools repository

GAM

Genome Architecture Mapping data

Imaging

  • MERFISH - Super-resolution imaging technology, reconstruction 3D structure in single cells at 30kb resolution, 1.2Mb region of Chr21 in IMR90 cells. Distance maps obtained by microscopy show small distance for loci within, and larger between, TADs. TAD-like structures exist in single cells. 2.5Mb region of Chr21 in HCT116 cells, cohesin depletion does not abolish TADs, only alter their preferential positioning. Multi-point (triplet) interactions are prevalent. TAD boundaries are highly heterogeneous in single cells. , diffraction-limited and STORM (stochastic optical reconstruction microscopy) imaging. GitHub

  • Single-cell level massively multiplexed FISH (MERFISH, sequential genome imaging) to measure 3D genome structure in context of gene expression and nuclear structures. Approx. 650 loci, 50kb resolution, on chr21 10.4-46.7Mb from the hg38 genome assembly, IMR90 cells, population average from approx. 12K chr21 copies, multiple rounds of hybridization. Investigation of TADs, A/B compartments, 87% agreement with bulk Hi-C. Association with cell type markers, transcription. Genome-scale imaging using barcodes, 1041 30kb loci covering autosomes and chrX of IMR90, over 5K cells, 5 replicates. Processed multiplexed FISH data and more, TXT format, GitHub

  • Parser of multiplexed single-cell imaging data from Bintu et al. 2018 and Su et al. 2020 - Take 3D coordinates of the regions as input and write the distance and contact matrices for these datasets.

CTCF

Notes on CTCF motifs and data

Integrative Hi-C

  • 3D structure mediates the effect of genetic variants on gene expression. 317 lymphoblastoid (LCL) and 78 fibroblast (FIB) cell lines, Hi-C data from Rao et al. 2014 paper. Regulatory elements identified from H3K4me1, H3K4me3, H3K27ac ChIP-seq. The regulatory activity is structured in 12,583 well-delimited cis-regulatory domains (CRDs) that respect the local chromatin organization into topologically associating domains (TADs) but constitute finer organization. 30 trans-regulatory hubs (TRHs) formed by CDRs on distinct chromosomes, associated with AB compartments and allelic regulation. Processed data - cQTLs - variants associated with chromatin peak activity; (cis/trans) eQTLs - variants associated with gene expression; aCRD-QTLs - variants associated with CRD activity; sCRD-QTLs - variants associated with CRD structure; chromatin peaks, and CRDs. For LCL and FIB cell lines, coordinates in hg19.
    Paper Delaneau, O., M. Zazhytska, C. Borel, G. Giannuzzi, G. Rey, C. Howald, S. Kumar, et al. “Chromatin Three-Dimensional Interactions Mediate Genetic Effects on Gene Expression.” Science (New York, N.Y.) 364, no. 6439 (03 2019). https://doi.org/10.1126/science.aat8266.

Misc

  • RNA-Chrom - database of RNA-chromatin interactions. Human & mouse. Manually curated. Data from "all-to-all" methods (MARGI, GRID-seq, ChAR-seq, iMARGI, RADICL-seq, Red-C) and "one-to-all" methods (RAP, CHART-seq, CgURO-seq, dChIRP-seq, ChOP-seq, CHIRT-seq), databases. Uniform processing. RNA- and DNA-centric searches. Video tutorial 1, tutorial 2. Download.
    Paper Ryabykh, G. K., S. V. Kuznetsov, Y. D Korostelev, A. I. Sigorskikh, A. A. Zharikova, and A. A. Mironov. “RNA-Chrom: A Manually-Curated Analytical Database of RNA–Chromatin Interactome.” Preprint. Bioinformatics, December 12, 2022. https://doi.org/10.1101/2022.12.10.519346.
  • Prioritization of COVID-19 candidate genes using 3D chromosomal topology. Applying COGS (Capture Hi-C Omnibus Gene Score), a statistical pipeline for linking GWAS variants with their target genes based on 3D chromatin interaction data. COVID-19 GWAS data. Promoter-capture Hi-C data from Javierre et al., “Lineage-Specific Genome Architecture Links Enhancers and Non-Coding Disease Variants to Target Gene Promoters” and Ho et al. "TOP1 inhibition therapy protects against SARS-CoV-2-induced lethal inflammation" studies (17 human primary cell types data and SARS-CoV-2-infected lung carcinoma cells data). Four prioritization approaches, summary in Supplementary Table S4. Biological analysis.
    Paper Thiecke, Michiel J., Emma J. Yang, Oliver S. Burren, Helen Ray-Jones, and Mikhail Spivakov. “[Prioritisation of Candidate Genes Underpinning COVID-19 Host Genetic Traits Based on High-Resolution 3D Chromosomal Topology](https://doi.org/10.3389/fgene.2021.745672).” Frontiers in Genetics 12 (October 25, 2021)

More Repositories

1

scRNA-seq_notes

A list of scRNA-seq analysis tools
R
510
star
2

HiC_tools

A collection of tools for Hi-C data analysis
482
star
3

MachineLearning_notes

Machine learning and deep learning resources
401
star
4

TCGAsurvival

Scripts to analyze TCGA data
R
113
star
5

Cancer_notes

A continually expanding collection of cancer genomics notes and data
92
star
6

Statistics_notes

Statistics, data analysis tutorials and learning resources
72
star
7

scATAC-seq_notes

scATAC-seq data analysis tools and papers
67
star
8

Immuno_notes

Immunology-related bioinformatics data and tools
61
star
9

scHiC_notes

Notes on single-cell Hi-C technologies, tools, and data
54
star
10

MDnotes

Links to all data science, genomics, and other notes
37
star
11

RNA-seq_notes

A continually expanding collection of RNA-seq tools
33
star
12

Brain_genomic_data

Brain-related -omics data
22
star
13

SNP_notes

Notes on SNP-related tools and genome variation analysis
20
star
14

gwas2bed

Extracting disease-specific genomic coordinates from GWAS catalog
HTML
18
star
15

ChIP-seq_notes

Notes on ChIP-seq and other-seq-related tools
17
star
16

blogs

Links to data science, bioinformatics, statistics, and machine learning resources
16
star
17

Aging

Epigenomic enrichment analysis of age-related genomic regions
R
15
star
18

Microbiome_notes

A continually expanding collection of microbiome analysis tools
14
star
19

RNA-seq

RNA-seq analysis scripts
R
14
star
20

Aging_clock

Data and papers related to epigenetic clocks predicting age
R
12
star
21

HiCcompareWorkshop

Differential Hi-C Data Analysis Workshop https://currentprotocols.onlinelibrary.wiley.com/doi/abs/10.1002/cpbi.76
Dockerfile
12
star
22

genomerunner_web

Web version of GenomeRunner
JavaScript
11
star
23

R_notes

Data science in R notes
9
star
24

Programming_notes

Programming-related notes
8
star
25

Methylation_notes

Notes on DNA methylation analysis
8
star
26

bioinformatics-impact

GitHub statistics as a measure of the impact of open-source bioinformatics software
TeX
7
star
27

E-MTAB-3610

Processed E-MTAB-3610 dataset - Transcriptional Profiling of 1,000 human cancer cell lines
R
7
star
28

BIOS668.2018

Web site for "Statistical Methods for High-throughput Genomic Data II" BIOS 668 course, Spring 2018 https://mdozmorov.github.io/BIOS668.2018
SCSS
7
star
29

presentations

Talks and related material
CSS
6
star
30

Python_notes

Data science in Python notes
5
star
31

manuscript_template

Template of a manuscript in Rmd
TeX
5
star
32

Jobs_notes

Notes for job seekers
5
star
33

promoter_extract

Extract genomic coordinates of the promoters from a list of genes.
Python
4
star
34

ChIP-seq

Scripts to analyze ChIP-seq data
Shell
4
star
35

BIOS691_Cancer_Bioinformatics

Course material for the BIOS691 "Cancer Bioinformatics" course, January 25 - May 7, 2021
HTML
4
star
36

Talk_3Dgenome

Slides for "The genome in action: Detecting and interpreting changes in the 3D genome organization" talk
SCSS
4
star
37

CTCF

Genomic coordinates of FIMO-predicted CTCF binding sites using JASPAR and other PWMs, human and mouse genome assemblies including mm39 and T2T. Also included experimentally derived ENCODE SCREEN CTCF-bound CREs.
R
4
star
38

MDgenomerunner

MD functions mostly for GenomeRunner project. See MDmisc R package for MD miscellaneous functions
R
4
star
39

bios524-r-2021

"Biostatistical Computing with R" course
HTML
3
star
40

BIOS691_deep_learning_R

"Deep Learning with R" course material
HTML
3
star
41

HMP2

16S rRNA sequencing data for the HMP2 project
Shell
3
star
42

Talk_reproducible_research_overview_2021

Brief overview of computational reproducible research, Unix, remote computing (SSH), Conda, pipelines, R/RMarkdown, Git/GitHub, Docker, Cloud, Kubernetes. The goal is to provide students with modern data science ecosystem of tools for further studies.
JavaScript
3
star
43

MDmisc

MD helper functions. Previous version at https://github.com/mdozmorov/MDgenomerunner
R
2
star
44

R.genomerunner

Scripts and examples of visualization and analysis of the enrichment and epigenomic similarity results
HTML
2
star
45

dcaf

Misc. scripts and examples
Shell
2
star
46

Grants_notes

Notes on potential funding opportunities
2
star
47

activeranges

Expanding collection of biologically active chromatin regions as GRanges.
R
2
star
48

GTEx

Playground with GTEx data
R
2
star
49

63_immune_cells

Gene expression profiles of 63 immune cell types
R
2
star
50

R.Lorin.RNA-seq

Interpretation of RNA-seq data
R
2
star
51

Talk_preciseTAD

Slides for "preciseTAD: A transfer learning framework for 3D domain boundary prediction at base-pair resolution" presentation
SCSS
2
star
52

GenomeRunner

Automating genome exploration
Visual Basic
1
star
53

Talk_Genomics

Talk for the Science Club, Department of Pathology, VCU. May 15, 2019.
1
star
54

deconvolution

Cell type-specific deconvolution of 'omics' data
R
1
star
55

Talk_JSM2019

Slides for JSM2019, "SpectralTAD: Defining Hierarchy of Topologically Associated Domains Using Graph Theoretical Clustering"
1
star
56

Methylation850K

Methylation analysis of Illumina 850K arrays
R
1
star
57

beamer_template

Beamer template for RMarkdown class presentation
1
star
58

Talk_ISMB2020

TADcompare abstract for the virtual ISMB 2020 conference
1
star
59

grdocs

GenomeRunner documentation
TeX
1
star
60

R.-ChIP-seq.histone

Analysis of histone marks, and their differential presence in the genome
R
1
star
61

Talk_HiCcompare

Slides for HiCcompareWorkshop
HTML
1
star
62

R.Sjogren

Sjogren syndrome microarray data analysis
HTML
1
star
63

lecture1

Test repo
1
star
64

BIOS567

Web site for "Statistical Methods for High-throughput Genomic Data I" BIOS 567 course
1
star
65

Data_notes

Lists of publicly available datasets for machine learning
1
star
66

PathwayRunner

PathwayRunner computed enrichment of gene set(s) in all pathways using hypergeometric test
R
1
star
67

GDS-processor

Process GDS files from Gene Expression Omnibus (GEO)
Visual Basic
1
star
68

Talk_Hi-C

An overview presentation of chromatin conformation capture technologies and analysis methods.
1
star
69

Quantile-normalization

Quantile normalization of gene expression matrix with missing values
Visual Basic
1
star
70

RepeatSoaker

a simple method to eliminate low-complexity short reads
Makefile
1
star
71

BIOS567.2017

Web site for "Statistical Methods for High-throughput Genomic Data I" BIOS 567 course, Fall 2017
SCSS
1
star