• Stars
    star
    442
  • Rank 95,353 (Top 2 %)
  • Language
    C++
  • License
    MIT License
  • Created about 9 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the IEEE CAL 2015 paper by Kim et al. at http://users.ece.cmu.edu/~omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf

Ramulator: A DRAM Simulator

Ramulator is a fast and cycle-accurate DRAM simulator [1, 2] that supports a wide array of commercial, as well as academic, DRAM standards:

  • DDR3 (2007), DDR4 (2012)
  • LPDDR3 (2012), LPDDR4 (2014)
  • GDDR5 (2009)
  • WIO (2011), WIO2 (2014)
  • HBM (2013)
  • SALP [3]
  • TL-DRAM [4]
  • RowClone [5]
  • DSARP [6]

The initial release of Ramulator is described in the following paper:

Y. Kim, W. Yang, O. Mutlu. "Ramulator: A Fast and Extensible DRAM Simulator". In IEEE Computer Architecture Letters, March 2015.

For information on new features, along with an extensive memory characterization using Ramulator, please read:

S. Ghose, T. Li, N. Hajinazar, D. Senol Cali, O. Mutlu. "Demystifying Complex Workload–DRAM Interactions: An Experimental Study". In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), June 2019 (slides). In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2019.

[1] Kim et al. Ramulator: A Fast and Extensible DRAM Simulator. IEEE CAL 2015.
[2] Ghose et al. Demystifying Complex Workload–DRAM Interactions: An Experimental Study. SIGMETRICS 2019.
[3] Kim et al. A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM. ISCA 2012.
[4] Lee et al. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. HPCA 2013.
[5] Seshadri et al. RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization. MICRO 2013.
[6] Chang et al. Improving DRAM Performance by Parallelizing Refreshes with Accesses. HPCA 2014.

Usage

Ramulator supports three different usage modes.

  1. Memory Trace Driven: Ramulator directly reads memory traces from a file, and simulates only the DRAM subsystem. Each line in the trace file represents a memory request, with the hexadecimal address followed by 'R' or 'W' for read or write.
  • 0x12345680 R
  • 0x4cbd56c0 W
  • ...
  1. CPU Trace Driven: Ramulator directly reads instruction traces from a file, and simulates a simplified model of a "core" that generates memory requests to the DRAM subsystem. Each line in the trace file represents a memory request, and can have one of the following two formats.
  • <num-cpuinst> <addr-read>: For a line with two tokens, the first token represents the number of CPU (i.e., non-memory) instructions before the memory request, and the second token is the decimal address of a read.

  • <num-cpuinst> <addr-read> <addr-writeback>: For a line with three tokens, the third token is the decimal address of the writeback request, which is the dirty cache-line eviction caused by the read request before it.

  1. gem5 Driven: Ramulator runs as part of a full-system simulator (gem5 [7]), from which it receives memory request as they are generated.

For some of the DRAM standards, Ramulator is also capable of reporting power consumption by relying on either VAMPIRE [8] or DRAMPower [9] as the backend.

[7] The gem5 Simulator System.
[8] Ghose et al. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study. SIGMETRICS 2018.
[9] Chandrasekar et al. DRAMPower: Open-Source DRAM Power & Energy Estimation Tool. IEEE CAL 2015.

Getting Started

Ramulator requires a C++11 compiler (e.g., clang++, g++-5).

  1. Memory Trace Driven

     $ cd ramulator
     $ make -j
     $ ./ramulator configs/DDR3-config.cfg --mode=dram dram.trace
     Simulation done. Statistics written to DDR3.stats
     # NOTE: dram.trace is a very short trace file provided only as an example.
     $ ./ramulator configs/DDR3-config.cfg --mode=dram --stats my_output.txt dram.trace
     Simulation done. Statistics written to my_output.txt
     # NOTE: optional --stats flag changes the statistics output filename
    
  2. CPU Trace Driven

     $ cd ramulator
     $ make -j
     $ ./ramulator configs/DDR3-config.cfg --mode=cpu cpu.trace
     Simulation done. Statistics written to DDR3.stats
     # NOTE: cpu.trace is a very short trace file provided only as an example.
     $ ./ramulator configs/DDR3-config.cfg --mode=cpu --stats my_output.txt cpu.trace
     Simulation done. Statistics written to my_output.txt
     # NOTE: optional --stats flag changes the statistics output filename
    
  3. gem5 Driven

    Requires SWIG 2.0.12+, gperftools (libgoogle-perftools-dev package on Ubuntu)

     $ hg clone http://repo.gem5.org/gem5-stable
     $ cd gem5-stable
     $ hg update -c 10231  # Revert to stable version from 5/31/2014 (10231:0e86fac7254c)
     $ patch -Np1 --ignore-whitespace < /path/to/ramulator/gem5-0e86fac7254c-ramulator.patch
     $ cd ext/ramulator
     $ mkdir Ramulator
     $ cp -r /path/to/ramulator/src Ramulator
     # Compile gem5
     # Run gem5 with `--mem-type=ramulator` and `--ramulator-config=configs/DDR3-config.cfg`
    

By default, gem5 uses the atomic CPU and uses atomic memory accesses, i.e. a detailed memory model like ramulator is not really used. To actually run gem5 in timing mode, a CPU type need to be specified by command line parameter --cpu-type. e.g. --cpu-type=timing

Simulation Output

Ramulator will report a series of statistics for every run, which are written to a file. We have provided a series of gem5-compatible statistics classes in Statistics.h.

Memory Trace/CPU Trace Driven: When run in memory trace driven or CPU trace driven mode, Ramulator will write these statistics to a file. By default, the filename will be <standard_name>.stats (e.g., DDR3.stats). You can write the statistics file to a different filename by adding --stats <filename> to the command line after the --mode switch (see examples above).

gem5 Driven: Ramulator automatically integrates its statistics into gem5. Ramulator's statistics are written directly into the gem5 statistic file, with the prefix ramulator. added to each stat's name.

NOTE: When creating your own stats objects, don't place them inside STL containers that are automatically resized (e.g, vector). Since these containers copy on resize, you will end up with duplicate statistics printed in the output file.

Reproducing Results from Paper (Kim et al. [1])

Debugging & Verification (Section 4.1)

For debugging and verification purposes, Ramulator can print the trace of every DRAM command it issues along with their address and timing information. To do so, please turn on the print_cmd_trace variable in the configuration file.

Comparison Against Other Simulators (Section 4.2)

For comparing Ramulator against other DRAM simulators, we provide a script that automates the process: test_ddr3.py. Before you run this script, however, you must specify the location of their executables and configuration files at designated lines in the script's source code:

Please refer to their respective websites to download, build, and set-up the other simulators. The simulators must to be executed in saturation mode (always filling up the request queues when possible).

All five simulators were configured using the same parameters:

  • DDR3-1600K (11-11-11), 1 Channel, 1 Rank, 2Gb x8 chips
  • FR-FCFS Scheduling
  • Open-Row Policy
  • 32/32 Entry Read/Write Queues
  • High/Low Watermarks for Write Queue: 28/16

Finally, execute test_ddr3.py <num-requests> to start off the simulation. Please make sure that there are no other active processes during simulation to yield accurate measurements of memory usage and CPU time.

Cross-Sectional Study of DRAM Standards (Section 4.3)

Please use the CPU traces (SPEC 2006) provided in the cputraces folder to run CPU trace driven simulations.

Other Tips

Power Estimation

For estimating power consumption, Ramulator can record the trace of every DRAM command it issues to a file in DRAMPower [8] format. To do so, please turn on the record_cmd_trace variable in the configuration file. The resulting DRAM command trace (e.g., cmd-trace-chan-N-rank-M.cmdtrace) should be fed into a compatible DRAM energy simulator such as VAMPIRE [8] or DRAMPower [9] with the correct configuration (standard/speed/organization) to estimate energy/power usage for a single rank (a current limitation of both VAMPIRE and DRAMPower).

Contributors

  • Yoongu Kim (Carnegie Mellon University)
  • Weikun Yang (Peking University)
  • Kevin Chang (Carnegie Mellon University)
  • Donghyuk Lee (Carnegie Mellon University)
  • Vivek Seshadri (Carnegie Mellon University)
  • Saugata Ghose (Carnegie Mellon University)
  • Tianshi Li (Carnegie Mellon University)
  • @henryzh

Acknowledgments

We thank the SAFARI group members who have contributed to the initial development of Ramulator, including Kevin Chang, Saugata Ghose, Donghyuk Lee, Tianshi Li, and Vivek Seshadri. We also thank the anonymous reviewers for feedback. This work was supported by NSF, SRC, and gifts from our industrial partners, including Google, Intel, Microsoft, Nvidia, Samsung, Seagate and VMware.

More Repositories

1

rowhammer

Source code for testing the Row Hammer error mechanism in DRAM devices. Described in the ISCA 2014 paper by Kim et al. at http://users.ece.cmu.edu/~omutlu/pub/dram-row-hammer_isca14.pdf.
C
208
star
2

MQSim

MQSim is a fast and accurate simulator modeling the performance of modern multi-queue (MQ) SSDs as well as traditional SATA based SSDs. MQSim faithfully models new high-bandwidth protocol implementations, steady-state SSD conditions, and the full end-to-end latency of requests in modern SSDs. It is described in detail in the FAST 2018 paper by Arash Tavakkol et al., "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" (https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf)
C++
177
star
3

ramulator-pim

A fast and flexible simulation infrastructure for exploring general-purpose processing-in-memory (PIM) architectures. Ramulator-PIM combines a widely-used simulator for out-of-order and in-order processors (ZSim) with Ramulator, a DRAM simulator with memory models for DDRx, LPDDRx, GDDRx, WIOx, HBMx, and HMCx. Ramulator is described in the IEEE CAL 2015 paper by Kim et al. at https://people.inf.ethz.ch/omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf Ramulator-PIM is used in the DAC 2019 paper by Singh et al. at https://people.inf.ethz.ch/omutlu/pub/NAPEL-near-memory-computing-performance-prediction-via-ML_dac19.pdf
C++
125
star
4

prim-benchmarks

PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is developed to evaluate, analyze, and characterize the first publicly-available real-world PIM architecture, the UPMEM PIM architecture. Described by Gómez-Luna et al. (https://arxiv.org/abs/2105.03814).
C
100
star
5

SoftMC

SoftMC is an experimental FPGA-based memory controller design that can be used to develop tests for DDR3 SODIMMs using a C++ based API. The design, the interface, and its capabilities and limitations are discussed in our HPCA 2017 paper: "SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies" <https://people.inf.ethz.ch/omutlu/pub/softMC_hpca17.pdf>
Verilog
94
star
6

Pythia

A customizable hardware prefetching framework using online reinforcement learning as described in the MICRO 2021 paper by Bera et al. (https://arxiv.org/pdf/2109.12021.pdf).
C++
76
star
7

DAMOV

DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processing. Described by Oliveira et al. (preliminary version at https://arxiv.org/pdf/2105.03725.pdf)
C++
69
star
8

SparseP

SparseP is the first open-source Sparse Matrix Vector Multiplication (SpMV) software package for real-world Processing-In-Memory (PIM) architectures. SparseP is developed to evaluate and characterize the first publicly-available real-world PIM architecture, the UPMEM PIM architecture. Described by C. Giannoula et al. [https://arxiv.org/abs/2201.05072]
C
65
star
9

Hermes

A speculative mechanism to accelerate long-latency off-chip load requests by removing on-chip cache access latency from their critical path, as described by MICRO 2022 paper by Bera et al. (https://arxiv.org/pdf/2209.00188.pdf)
C++
48
star
10

PiDRAM

PiDRAM is the first flexible end-to-end framework that enables system integration studies and evaluation of real Processing-using-Memory techniques. Prototype on a RISC-V rocket chip system implemented on an FPGA. Described in our paper: https://arxiv.org/abs/2111.00082
47
star
11

SneakySnake

SneakySnake🐍 is the first and the only pre-alignment filtering algorithm that works efficiently and fast on modern CPU, FPGA, and GPU architectures. It greatly (by more than two orders of magnitude) expedites sequence alignment calculation for both short and long reads. Described in the Bioinformatics (2020) by Alser et al. https://arxiv.org/abs/1910.09020.
VHDL
46
star
12

IMPICA

This is a processing-in-memory simulator which models 3D-stacked memory within gem5. Also includes the workloads used for IMPICA (In-Memory PoInter Chasing Accelerator), an ICCD 2016 paper by Hsieh et al. at https://users.ece.cmu.edu/~omutlu/pub/in-memory-pointer-chasing-accelerator_iccd16.pdf
C
42
star
13

Mosaic

Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes" https://people.inf.ethz.ch/omutlu/pub/mosaic-application-transparent-multiple-page-sizes-for-GPUs_micro17.pdf
C++
38
star
14

GPGPUSim-Ramulator

The source code for GPGPUSim+Ramulator simulator. In this version, GPGPUSim uses Ramulator to simulate the DRAM. This simulator is used to produce some of the results in our SIGMETRICS 2019 paper: Ghose et al., "Demystifying Complex Workload-DRAM Interactions: An Experimental Study" at https://arxiv.org/pdf/1902.07609.pdf.
C++
35
star
15

BLEND

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Described by Firtina et al. (published in NARGAB https://doi.org/10.1093/nargab/lqad004)
C
34
star
16

DRAM-Bender

DRAM Bender is the first open source DRAM testing infrastructure that can be used to easily and comprehensively test state-of-the-art DDR4 modules of different form factors. Five prototypes are available on different FPGA boards. Described in our preprint: https://arxiv.org/pdf/2211.05838.pdf
VHDL
32
star
17

Scrooge

Scrooge is a high-performance pairwise sequence aligner based on the GenASM algorithm. Scrooge includes three novel algorithmic improvements on top of GenASM, and high-performance CPU and GPU implementations. Described by Lindegger et al. at https://doi.org/10.48550/arXiv.2208.09985
C
31
star
18

RawHash

RawHash is the first mechanism that can accurately and efficiently map raw nanopore signals to large reference genomes (e.g., a human reference genome) in real-time without using powerful computational resources (e.g., GPUs). Described by Firtina et al. (published at https://academic.oup.com/bioinformatics/article/39/Supplement_1/i297/7210440)
C
31
star
19

Shifted-Hamming-Distance

Source code for the Shifted Hamming Distance (SHD) filtering mechanism for sequence alignment. Described in the Bioinformatics journal paper (2015) by Xin et al. at http://users.ece.cmu.edu/~omutlu/pub/shifted-hamming-distance_bioinformatics15_proofs.pdf
C
27
star
20

FastRemap

FastRemap, a C++ tool for quickly remapping reads between genome assemblies based on the commonly used CrossMap tool. Link to paper: https://arxiv.org/pdf/2201.06255.pdf
C++
26
star
21

VAMPIRE

An open-source DRAM power model based on extensive experimental characterization of real DRAM modules. Described in the SIGMETRICS 2018 paper by Ghose et al. (https://people.inf.ethz.ch/omutlu/pub/VAMPIRE-DRAM-power-characterization-and-modeling_sigmetrics18_pomacs18.pdf)
C++
26
star
22

Apollo

Apollo is an assembly polishing algorithm that attempts to correct the errors in an assembly. It can take multiple set of reads in a single run and polish the assemblies of genomes of any size. Described in the Bioinformatics journal paper (2020) by Firtina et al. at https://people.inf.ethz.ch/omutlu/pub/apollo-technology-independent-genome-assembly-polishing_bioinformatics20.pdf
C++
26
star
23

GenASM

Source code for the software implementations of the GenASM algorithms proposed in our MICRO 2020 paper: Senol Cali et. al., "GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis" at https://people.inf.ethz.ch/omutlu/pub/GenASM-approximate-string-matching-framework-for-genome-analysis_micro20.pdf
C
26
star
24

AirLift

AirLift is a tool that updates mapped reads from one reference genome to another. Unlike existing tools, It accounts for regions not shared between the two reference genomes and enables remapping across all parts of the references. Described by Kim et al. (preliminary version at http://arxiv.org/abs/1912.08735)
C
26
star
25

BDICompression

Source code for the Base-Delta-Immediate Compression Algorithm (described in the PACT 2012 paper by Pekhimenko et al. at http://users.ece.cmu.edu/~omutlu/pub/bdi-compression_pact12.pdf)
C
22
star
26

Sibyl

Source code for the software implementation of Sibyl proposed in our ISCA 2022 paper: Gagandeep Singh et. al., "Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems using Online Reinforcement Learning" at https://people.inf.ethz.ch/omutlu/pub/Sibyl_RL-based-data-placement-in-hybrid-storage-systems_isca22.pdf
Python
21
star
27

Victima

Victima is a new software-transparent technique that greatly extends the address translation reach of modern processors by leveraging the underutilized resources of the cache hierarchy, as desribed in the MICRO 2023 paper by Kanellopoulos et al. (https://arxiv.org/pdf/2310.04158/)
C
19
star
28

Cache-Memory-Hog

Cache and main memory hog programs. These are programs with specific access patterns to evict the already existing cache blocks of various applications. These programs were designed to demonstrate that application performance is nearly linearly correlated with cache access rate (as shown in Section 3.1 of Subramanian et al. "The Application Slowdown Model" @ https://users.ece.cmu.edu/~omutlu/pub/application-slowdown-model_micro15.pdf)
C
18
star
29

NOCulator

NOCulator is a network-on-chip simulator providing cycle-accurate performance models for a wide variety of networks (mesh, torus, ring, hierarchical ring, flattened butterfly) and routers (buffered, bufferless, Adaptive Flow Control, minBD, HiRD).
C#
17
star
30

BEER

BEER determines an ECC code's parity-check matrix based on the uncorrectable errors it can cause. BEER targets Hamming codes that are used for DRAM on-die ECC but can be extended to apply to other linear block codes (e.g., BCH, Reed-Solomon). BEER is described in the 2020 MICRO paper by Patel et al.: https://arxiv.org/abs/2009.07985.
C++
16
star
31

BlockHammer

Source code for the cycle-level simulator and RTL implementation of BlockHammer proposed in our HPCA 2021 paper: Yaglikci et. al., "BlockHammer: Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows" at https://people.inf.ethz.ch/omutlu/pub/BlockHammer_preventing-DRAM-rowhammer-at-low-cost_hpca21.pdf
C++
16
star
32

RowPress

Source code & scripts for experimental characterization and real-system demonstration of RowPress, a widespread read disturbance phenomenon in DRAM that is different from RowHammer. Described in our ISCA'23 paper by Luo et al. at https://people.inf.ethz.ch/omutlu/pub/RowPress_isca23.pdf
VHDL
15
star
33

HWASim

HWASim is a simulator for heterogeneous systems with CPUs and Hardware Accelerators (HWAs). It is released with the DASH memory scheduler paper that appeared at ACM TACO 2016: https://users.ece.cmu.edu/~omutlu/pub/dash_deadline-aware-heterogeneous-memory-scheduler_taco16.pdf
C#
15
star
34

pLUTo

pLUTo is a DRAM-based Processing-using-Memory architecture that leverages the high density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs)
Jupyter Notebook
15
star
35

pim-ml

PIM-ML is a benchmark for training machine learning algorithms on the UPMEM architecture, which is the first publicly-available real-world processing-in-memory (PIM) architecture. Described in the ISPASS 2023 paper by Gomez-Luna et al. (https://arxiv.org/pdf/2207.07886.pdf).
C
15
star
36

Shouji

Shouji is fast and accurate pre-alignment filter for banded sequence alignment calculation. Described in the Bioinformatics journal paper (2019) by Alser et al. at https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz234/28533771/btz234.pdf
VHDL
15
star
37

SimplePIM

SimplePIM is the first high-level programming framework for real-world processing-in-memory (PIM) architectures. Described in the PACT 2023 paper by Chen et al. (https://arxiv.org/pdf/2310.01893.pdf).
C
14
star
38

CROW

Source code for the architectural and circuit-level simulators used for modeling the CROW (Copy-ROW DRAM) mechanism proposed in our ISCA 2019 paper "CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability". Paper is at: https://people.inf.ethz.ch/omutlu/pub/CROW-DRAM-substrate-for-performance-energy-reliability_isca19.pdf.
C++
14
star
39

SMASH

SMASH is a hardware-software cooperative mechanism that enables highly-efficient indexing and storage of sparse matrices. The key idea of SMASH is to compress sparse matrices with a hierarchical bitmap compression format that can be accelerated from hardware. Described by Kanellopoulos et al. (MICRO '19) https://people.inf.ethz.ch/omutlu/pub/SMASH-sparse-matrix-software-hardware-acceleration_micro19.pdf
C
14
star
40

MemBen

Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 paper: Ghose et al., "Demystifying Complex Workload-DRAM Interactions: An Experimental Study" at https://arxiv.org/pdf/1902.07609.pdf.
14
star
41

NATSA

NATSA is the first near-data-processing accelerator for time series analysis based on the Matrix Profile (SCRIMP) algorithm. NATSA exploits modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast matrix profile computation near memory. Described in ICCD 2020 by Fernandez et al. https://people.inf.ethz.ch/omutlu/pub/NATSA_time-series-analysis-near-data_iccd20.pdf
C++
13
star
42

GenStore

GenStore is the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. Described in the ASPLOS 2022 paper by Mansouri Ghiasi et al. at https://people.inf.ethz.ch/omutlu/pub/GenStore_asplos22-arxiv.pdf
C
12
star
43

ASMSim

This simulator models multi core systems with primary focus on the memory hierarchy. It models a trace-based out-of-order core frontend and memory scheduling policies like FRFCFS, ATLAS, TCM and slowdown estimation models, ASM and MISE. Based on the MICRO 2015 paper at https://users.ece.cmu.edu/~omutlu/pub/application-slowdown-model_micro15.pdf
C#
12
star
44

Genome-on-Diet

Genome-on-Diet is a fast and memory-frugal framework for exemplifying sparsified genomics for read mapping, containment search, and metagenomic profiling. It is much faster & more memory-efficient than minimap2 for Illumina, HiFi, and ONT reads. Described by Alser et al. (preliminary version: https://arxiv.org/abs/2211.08157).
Roff
11
star
45

SeGraM

Source code for the software implementation of SeGraM proposed in our ISCA 2022 paper: Senol Cali et. al., "SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping" at https://people.inf.ethz.ch/omutlu/pub/SeGraM_genomic-sequence-mapping-universal-accelerator_isca22.pdf
C
10
star
46

Virtuoso

Virtuoso is a new simulator that focuses on modelling various memory management and virtual memory aspects.
C++
9
star
47

GRIM

Source code of the processing-in-memory simulator used in the GRIM-Filter paper published at BMC Genomics in 2018: "GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping using Processing-in-Memory Technologies" (preliminary version at https://arxiv.org/pdf/1711.01177.pdf)
C
9
star
48

MemSchedSim

This simulator models multi core systems, intended primarily for studies on main memory management techniques. It models a trace-based out-of-order core frontend and models memory scheduling policies such as FRFCFS, ATLAS, TCM, BLISS. Based on the ICCD 2014 paper by Subramanian et al. at http://users.ece.cmu.edu/~omutlu/pub/bliss-memory-scheduler_iccd14.pdf
C#
9
star
49

RamulatorSharp

RamulatorSharp is a fast and flexible memory subsystem simulator implemented in C# and it can easily run on Linux, OS X, and Windows. The simulator contains the implementation of the Low-Cost Inter-Linked Subarrays (HPCA 2016) and ChargeCache (HPCA 2016) in addition to other features present in the C++ version of Ramulator: https://users.ece.cmu.edu/~omutlu/pub/lisa-dram_hpca16.pdf https://users.ece.cmu.edu/~omutlu/pub/chargecache_low-latency-dram_hpca16.pdf
C#
9
star
50

SPARTA

A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https://arxiv.org/pdf/2303.03509.pdf)
MLIR
8
star
51

CLRDRAM

Circuit-level model for the Capacity-Latency Reconfigurable DRAM (CLR-DRAM) architecture. This repository contains the SPICE models of the CLR-DRAM architecture and the baseline architecture used in our ISCA 2020 paper "Luo et al., CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off": https://people.inf.ethz.ch/omutlu/pub/CLR-DRAM_capacity-latency-reconfigurable-DRAM_isca20.pdf
AGS Script
8
star
52

COVIDHunter

COVIDHunter 🦠🚧: An accurate and flexible COVID-19 outbreak simulation model that forecasts the strength of future mitigation measures and the numbers of cases, hospitalizations, and deaths for a given day, while considering the potential effect of environmental conditions. Described by Alser et al. (preliminary version at https://arxiv.org/abs/2102.03667 and https://doi.org/10.1101/2021.02.06.21251265).
Swift
8
star
53

ApHMM-GPU

ApHMM-GPU is the first GPU implementation of the Baum-Welch algorithm for profile Hidden Markov Models (pHMMs). It includes many of the software optimizations as proposed in the ApHMM paper, which is described by Firtina et al. (preliminary version at https://arxiv.org/abs/2207.09765).
C++
8
star
54

EINSim

DRAM error-correction code (ECC) simulator incorporating statistical error properties and DRAM design characteristics for inferring pre-correction error characteristics using only the post-correction errors. Described in the 2019 DSN paper by Patel et al.: https://people.inf.ethz.ch/omutlu/pub/understanding-and-modeling-in-DRAM-ECC_dsn19.pdf.
C++
8
star
55

QUAC-TRNG

All sources to reproduce the results presented in our paper, QUAC-TRNG, the highest-throughput DRAM-based true random number generator, described in https://people.inf.ethz.ch/omutlu/pub/QUAC-TRNG-DRAM_isca21.pdf
C++
7
star
56

U-TRR

Source code of the U-TRR methodology presented in "Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHammer Patterns, and Implications", https://people.inf.ethz.ch/omutlu/pub/U-TRR-uncovering-RowHammer-protection-mechanisms_micro21.pdf
C++
7
star
57

memsim

Mem-Sim is a fast and flexible memory subsystem simulator. Based on the PACT 2012 paper by Seshadri et al. at http://users.ece.cmu.edu/~omutlu/pub/eaf-cache_pact12.pdf. The simulator contains the implementations of the Evicted-Address Filter (PACT 2012), Informed Caching Policies for Prefetched Blocks (TACO 2014), the Dirty-Block Index (ISCA 2014).
C++
7
star
58

Molecules2Variations

The first work to provide a comprehensive survey of a prominent set of algorithmic improvement and hardware acceleration efforts for the entire genome analysis pipeline used for the three most prominent sequencing data, short reads (Illumina), ultra-long reads (ONT), and accurate long reads (HiFi). Described in arXiv (2022) by Alser et al. https://arxiv.org/abs/2205.07957
7
star
59

transpimlib

TransPimLib is a library for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, TransPimLib provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. Described in ISPASS'23 paper by Item et al. (https://arxiv.org/pdf/2304.01951.pdf)
C
7
star
60

Pythia-HDL

Implementation of Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning in Chisel HDL. To know more, please read the paper that appeared in MICRO 2021 by Bera et al. (https://arxiv.org/pdf/2109.12021.pdf).
Scala
7
star
61

CoMeT

CoMeT is a new low-cost RowHammer mitigation that uses Count-Min Sketch-based aggressor row tracking
C++
6
star
62

HARP

HARP is a memory error profiling algorithm (i.e., for identifying error-prone cells) designed for use with memory chips that use on-die error-correcting codes (ECC). This tool uses Monte-Carlo simulation to evaluate HARP and other error profilers. HARP and this tool are described in the 2021 MICRO paper by Patel et al.: https://arxiv.org/abs/2109.12697.
C++
6
star
63

TargetCall

TargetCall is the first pre-basecalling filter that is applicable to a wide range of use cases to eliminate wasted computation in basecalling. Described in our preprint: https://arxiv.org/abs/2212.04953
Python
6
star
64

MetaSys

Metasys is the first open-source FPGA-based infrastructure with a prototype in a RISC-V core, to enable the rapid implementation and evaluation of a wide range of cross-layer software/hardware cooperative techniques techniques in real hardware. Described in our ACM TACO paper: https://dl.acm.org/doi/full/10.1145/3505250
6
star
65

optimal-seed-solver

Optimal Seed Solver (OSS) is a dynamic-programming algorithm that finds the optimal seeds of a read, which renders the minimum total seed frequency. It is described by Xin et al. at http://arxiv.org/pdf/1506.08235v1.pdf.
C++
6
star
66

DRAM-Datasheet-Survey

A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. This data and its analysis are described in the 2022 paper by Patel et al.: https://arxiv.org/abs/2204.10378
6
star
67

DRAM-Voltage-Study

Experimental study and analysis on the effect of using a wide range of different supply voltage values on the reliability, latency, and retention characteristics of DDR3L DRAM SO-DIMMs
AGS Script
5
star
68

UHMEM

A cycle-accurate simulator that models a hybrid memory subsystem consisting of multiple memory technologies. Described in the CLUSTER 2017 paper by Li et al. (https://people.inf.ethz.ch/omutlu/pub/utility-based-hybrid-memory-management_cluster17.pdf)
C#
5
star
69

DIVA-DRAM

This repository provides characterization data collected over 96 DDR3 SO-DIMMs, related to the following paper: Lee et al., "Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms", SIGMETRICS 2017. https://people.inf.ethz.ch/omutlu/pub/DIVA-low-latency-DRAM_sigmetrics17-paper.pdf
AGS Script
5
star
70

GateSeeder

GateSeeder is the first near-memory CPU-FPGA co-design for alleviating both the compute-bound and memory-bound bottlenecks in short and long-read mapping. GateSeeder outperforms Minimap2 by up to 40.3×, 4.8×, and 2.3× when mapping real ONT, HiFi, and Illumina reads, respectively.
C
5
star
71

BioDynaMo

BioDynamo is a flexible and high-performance agent based simulation engine. This repository contains artifacts and materials to support the reproducibility of the paper: Breitwieser et al., "High-Performance and Scalable Agent-Based Simulation with BioDynaMo," accepted to PPoPP '23: https://arxiv.org/pdf/2301.06984.pdf
4
star
72

LEAP

C++
4
star
73

SNP-Selective-Hiding

An optimization-based mechanism 🧬 🔐 to selectively hide the minimum number of overlapping SNPs among the family members 👨‍👩‍👧‍👦 who participated in the genomic studies (i.e. GWAS). Our goal is to distort the dependencies among the family members in the original database for achieving better privacy without significantly degrading the data utility.
MATLAB
4
star
74

SelfManagingDRAM

Source code for evaluating the performance and DRAM energy benefits of Self-Managing DRAM (SMD), proposed in https://arxiv.org/abs/2207.13358
C++
3
star
75

MIG-7-PHY-DDR3-Controller

A DDR3 Controller that uses the Xilinx MIG-7 PHY to interface with DDR3 devices.
Verilog
3
star
76

SMLA

This simulator models Simultaneous Multi Layer Access (SMLA) and 3D-stacked DRAM memory, based on the TACO 2016 paper https://users.ece.cmu.edu/~omutlu/pub/smla_high-bandwidth-3d-stacked-memory_taco16.pdf
C++
3
star
77

Utopia

Utopia is a new hybrid address mapping scheme that accelerates address translation while supporting all conventional VM features as described by Kanellopoulos et al. (https://arxiv.org/abs/2211.12205)
3
star
78

PDNspot

PDNspot is a versatile framework that enables the modeling and architectural exploration of power delivery networks (PDNs) of modern processors. PDNspot evaluates the effect of multiple PDN parameters, TDP, and workloads on the metrics of interest. Described in the MICRO 2020 paper by Jawad Haj-Yahya et al. at https://people.inf.ethz.ch/omutlu/pub/FlexWatts-HybridPowerDeliveryNetwork_micro20.pdf
Python
3
star
79

FastHASH

Source code for the mrFAST DNA read mapper with FastHASH filtering mechanism for sequence alignment. Described in the BMC Genomics journal paper (2013) by Xin et al. http://www.biomedcentral.com/1471-2164/14/S1/S13/
C
3
star
80

alignment-in-memory

AIM (Alignment-in-Memory), A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems, Bioinformatics, btad155, https://doi.org/10.1093/bioinformatics/btad155
C
2
star
81

Register-Interval

LTRF's register-interval creation algorithm divides the control flow graph (CFG) of a GPU application into some register-intervals which have two main characteristics: 1) register-intervals have only one entry-point in CFG, and 2) they have a limited number of registers. This algorithm is part of ASPLOS2018 paper by Sadrosadati et al. at https://people.inf.ethz.ch/omutlu/pub/LTRF-latency-tolerant-GPU-register-file_asplos18.pdf
C++
2
star
82

DRAM-Latency-Variation-Study

Latency characterization data collected from 30 real DRAM SO-DIMMs. You can find the background and analysis on the data in our SIGMETRICS'16 paper "Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization".
2
star
83

sirFAST

sirFAST is designed to map short reads generated with the Compelete Genomics (CG) platform to reference genome assemblies in a fast and memory-efficient manner. Described in the Methods 2014 paper by Lee et al., http://users.ece.cmu.edu/~omutlu/pub/complete-genomics-mapper_methods14_proofs.pdf.
C
1
star
84

MeDiC

This is a patch on GPGPU-sim for MeDiC. MeDiC is a mechanism that reduces the negative performance impact of memory divergence and cache queuing in GPUs. It is introduced in the PACT 2015 paper by Ausavarungnirun et al. at http://users.ece.cmu.edu/~omutlu/pub/MeDiC-for-GPGPUs_pact15.pdf
1
star