• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    C
  • License
    Other
  • Created almost 10 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lightweight locality-aware user-level threading runtime.

Build Status

WELCOME TO THE NEW HOME OF QTHREADS:

https://github.com/sandialabs/qthreads

QTHREADS!

The qthreads API is designed to make using large numbers of threads convenient and easy. The API maps well to both MTA-style threading and PIM-style threading, and is still quite useful in a standard SMP context. The qthreads API also provides access to full/empty-bit (FEB) semantics, where every word of memory can be marked either full or empty, and a thread can wait for any word to attain either state.

The qthreads library on an SMP is essentially a library for spawning and controlling coroutines: threads with small (4-8k) stacks. The threads are entirely in user-space and use their locked/unlocked status as part of their scheduling.

The library's metaphor is that there are many qthreads and several "shepherds". Shepherds generally map to specific processors or memory regions, but this is not an explicit part of the API. Qthreads are assigned to specific shepherds and do not generally migrate.

The API includes utility functions for making threaded loops, sorting, and similar operations convenient.

Collaboration

Need help or interested in finding out more? Join us on our Slack channel: https://join.slack.com/t/qthreads/signup

Performance

On a machine with approximately 2GB of RAM, this library was able to spawn and handle 350,000 qthreads. With some modifications (mostly in stack-size), it was able to handle 1,000,000 qthreads. It may be able to do more, but swapping will become an issue, and you may start to run out of address space.

This library has been tested, and runs well, on a 64-bit machine. It is occasionally tested on 32-bit machines, and has even been tested under Cygwin.

Currently, the only real limiting factor on the number of threads is the amount of memory and address space you have available. For more than 2^32 threads, the thread_id value will need to be made larger (or eliminated, as it is not required for correct operation by the library itself).

For information on how to use qthread or qalloc, there is A LOT of information in the header files (qthread.h and qalloc.h), but the primary documentation is man pages.

FUTURELIB DOCUMENTATION (the 10-minute version)

The most important functions in futurelib that a person is going to use are mt_loop and mt_loop_returns. The mt_loop function is for parallel iterations that do not return values, and the mt_loop_returns function is for parallel iterations that DO return values. The distinction is not always so obvious.

mt_loop is used in a format like so:

  mt_loop<...argtypelist..., looptype>
         (function, ...arglist..., startval, stopval, stepval);

The "stepval" is optional, and defaults to 1.

Essentially what you're doing is in the template setup (in the <>) you're specifying how to handle the arguments to the parallel functions and what kind of parallelism you want. Options for 'looptype' (i.e. the kind of parallelism) are:

mt_loop_traits::Par - fork all iterations, wait for them to finish mt_loop_traits::ParNoJoin - same as Par, but without the waiting mt_loop_traits::Future - a resource-constrained version of par, will limit the number of threads running at a given time mt_loop_traits::FutureNoJoin - same as Future, but without waiting for threads to finish

The argtypelist is a list of conceptual types defining how the arguments to the parallel function will be handled. Use one conceptual type per argument, in the order the arguments will be passed. Valid conceptual types are:

Iterator - The parallel function will be called with the current loop
	iteration number passed into this argument.
ArrayPtr - The corresponding argument is a pointer to an array, and each
	iteration will be passed the value of array[iteration]
Ref - The corresponding argument will be passed as a reference.
Val - The corresponding argument will be passed as a constant value
	(i.e. the same value will be passed to all iterations)

For example, doing this:

  for (int i = 0; i < 10; i++) {
    array[i] = i;
  }

Would be achieved like so:

  void assign(int &array_value, const int i) {
    array_value = i;
  }

  mt_loop<ArrayPtr, Iterator, mt_loop_traits::Par>
    (assign, array, 0, 0, 10);

The mt_loop_returns variant adds the specification of what to do with the return values. The pattern is like this:

  mt_loop_returns<returnvaltype, ...argtypelist..., looptype>
    (retval, function, ...args..., start, stop, step);

The only difference is in the returnvaltype and the retval. The returnvaltype can be either an ArrayPtr or a Collect. If it is an ArrayPtr, the loop will behave similar to the following loop:

  for (int i = start; i < stop; i += step) {
    retval[i] = function(args);
  }

Each return value will be stored in a separate entry in the retval array. The Collect type is more interesting, and can be either:

Collect<mt_loop_traits::Add> - this sums all of the return values in parallel Collect<mt_loop_traits::Sub> - this subtracts all of the return values in parallel. Note that the answer may be nondeterministic. Collect<mt_loop_traits::Mult> - this multiplies all of the return values in parallel Collect<mt_loop_traits::Div> - this divides all of the return values in parallel. Note that the answer is nondeterministic.

For example, Collect<mt_loop_traits::Add> is rougly equivalent to the following loop:

  for (int i = start; i < stop; i += step) {
    retval += function(args);
  }

##NOTE FOR PGI USERS pgcc needs the -c9x flag in order to correctly process variadic macros (which are used in qthread.c) and the PRIuMAX format definitions (used in qalloc.c). Use the CFLAGS variable to add this flag. Note that pgcc's support for the full C90/C99 standards is lousy, so most C90/C99 features that COULD be used are avoided.


##NOTE FOR IBM XL USERS make check will probably fail with the error:

xlc++: 1501-210 command option t contains an incorrect subargument .../.libs/libqthread.so: could not read symbols: Invalid operation

This does not mean that the library did not compile correctly, but instead means that your libtool is probably broken (most are). The problem seems to be that the wrapper script (testloop) is created with incorrect arguments to xlc++. The other wrapper scripts (e.g. test1/test2/test3/testq) all have the correct arguments, and if you modify testloop so that $relink_command uses the -Wl,--rpath -Wl,directory syntax rather than the -rpath,directory syntax, it would work just fine.


NOTE FOR IBM BLUEGENE/P GCC USERS

Old versions of GCC do not handle builtin atomics correctly on this platform. The non-existence of __sync_fetch_and_add() cannot be reliably detected, so to use those compilers, you probably need to configure with --disable-internal-spinlock.


NOTE FOR TILERA USERS

The Tilera cache coherency protocols, as of the TileGX boards, appear to be somewhat buggy for large multithreaded programs. And by buggy I mean they cause kernel panics (at least, I haven't been able to demonstrate data corruption yet). Thankfully, you can pick from several cache coherency protocols, and one of them is more stable than the default. What I have found that seems to be more stable, if not perfectly stable, is to force the cache coherency protocol to hashed. The way you do this is with a boot argument to the Tilera kernel. The tile-monitor command I use is this:

`tile-monitor --net <tilera> --hvx ucache_hash=all --`

Good luck!

More Repositories

1

wiretap

Wiretap is a transparent, VPN-like proxy server that tunnels traffic via WireGuard and requires no special privileges to run.
Go
847
star
2

toyplot

Interactive plotting for Python.
Jupyter Notebook
422
star
3

Albany

Sandia National Laboratories' Albany multiphysics code
C++
281
star
4

scot

Sandia Cyber Omni Tracker (SCOT)
JavaScript
237
star
5

dr_robot

This tool can be used to enumerate the subdomains associated with a company by aggregating the results of multiple OSINT (Open Source Intelligence) tools.
Python
138
star
6

pyGSTi

A python implementation of Gate Set Tomography
Jupyter Notebook
134
star
7

seacas

The Sandia Engineering Analysis Code Access System (SEACAS) is a suite of preprocessing, postprocessing, translation, and utility applications supporting finite element analysis software using the Exodus database file format.
C
131
star
8

snl-quest

An open source, Python-based software platform for energy storage simulation and analysis developed by Sandia National Laboratories.
Python
128
star
9

InterSpec

spectral radiation analysis software
C++
120
star
10

omega_h

Simplex mesh adaptivity for HPC
C++
105
star
11

slycat

Web-based data science analysis and visualization platform.
JavaScript
75
star
12

pecos

Python package for performance monitoring of time series data
Python
64
star
13

gr-pdu_utils

GNU Radio PDU Utilities
C++
61
star
14

UQTk

Sandia Uncertainty Quantification Toolkit
Fortran
59
star
15

chama

Python package for sensor placement optimization
Python
57
star
16

bcnn

3D Bayesian Convolutional Neural Network (BCNN) for Credible Geometric Uncertainty. Code for the paper: https://arxiv.org/abs/1910.10793
Python
56
star
17

gr-fhss_utils

Bursty modem utilities
C++
56
star
18

Whetstone

Python
47
star
19

TChem

TChem - A Software Toolkit for the Analysis of Complex Kinetic Models
Jupyter Notebook
44
star
20

tracktable

Tracktable: moving object trajectory analysis in C++ and Python
C++
39
star
21

tbnn

This code implements the Tensor Basis Neural Network (TBNN) as described in Ling et al. (Journal of Fluid Mechanics, 2016).
Python
38
star
22

hyram

Python
38
star
23

reverse_argparse

A Python library to determine what exactly the user ran at the command line, along with default values, and any transformations that happened after parsing arguments.
Python
38
star
24

pyapprox

Python
37
star
25

Spitfire

Spitfire is a Python/C++ library for constructing tabulated chemistry models and solving differential equations.
Python
37
star
26

portals4

Portals is a low-level network API for high-performance networking on high-performance computing systems developed by Sandia National Laboratories, Intel Corporation, and the University of New Mexico. The Portals 4 Reference Implementation is a complete implementation of Portals 4, with transport over InfiniBand VERBS and UDP. Shared memory transport is available as an optimization, including Linux KNEM support. The Portals 4 reference implementation is supported on both modern 64 bit Linux and 64 bit Mac OS X. The reference implementation has been developed by Sandia National Laboratories, Intel Corporation, and System Fabric Works. For more information on the Portals 4 standard, please see the Portals 4 page.
C
34
star
27

cross-sim

CrossSim: accuracy simulation of analog in-memory computing
Python
33
star
28

NuMAD

Numerical Manufacturing And Design Tool (NuMAD) -- A design tool for wind and water turbine composite blades
MATLAB
33
star
29

Prove-It

A tool for proving and organizing general theorems using Python.
Jupyter Notebook
27
star
30

SpecUtils

A library for opening, manipulating, and exporting gamma spectral files
C++
26
star
31

optimism

Computational solid mechanics made easy with Jax
Python
26
star
32

spack-manager

A project and machine deployment model using Spack
Python
25
star
33

gr-timing_utils

GNU Radio Timing Utilties
C++
25
star
34

MATLAB_PV_LIB

MATLAB PV function library
HTML
24
star
35

lgrtk

Tool Kit for Lagrangian Grid Reconnection
C++
22
star
36

mac-sandbox

This is a malware analyzer for Mac OS X that extends the Cuckoo Sandbox project (https://cuckoosandbox.org/)
Python
22
star
37

gait

Zeek Extension to Collect Metadata for Profiling of Endpoints and Proxies
Zeek
21
star
38

n2a

An object-oriented language for modeling large-scale neural systems, along with an IDE for writing and simulating models.
Java
21
star
39

MEWS

Multi-scenario Extreme Weather Simulator (MEWS)
Python
20
star
40

spat

A graphical user interface for measuring and performing inter-active analysis of physical unclonable functions (PUFs)
Python
20
star
41

compadre

Compadre (Compatible Particle Discretization and Remap)
C++
20
star
42

pyttb

Python Tensor Toolbox
Python
19
star
43

poblano_toolbox

Nonlinear optimization for MATLAB.
MATLAB
19
star
44

pycheron

Pycheron - A python library for quality control of seismic data based on IRIS Mustang.
Python
18
star
45

MatMCNP

A utility code for generating material cards for MCNP
Python
18
star
46

Zoltan

Zoltan Dynamic Load Balancing and Graph Algorithm Toolkit -- Distribution site
C
18
star
47

pvOps

A set of documented functions for supporting operations research of photovoltaic energy systems.
Jupyter Notebook
18
star
48

verdict

verdict
C++
17
star
49

Chordly

Chordly is a javascript library that may be used to detect and act upon key sequences entered by a user.
JavaScript
17
star
50

SandiaDecay

Easy to use C++ library to calculate nuclear decays and emissions
C++
17
star
51

CACTUS

CACTUS (Code for Axial and Cross-flow TUrbine Simulation) is a turbine performance simulation code, based on a free wake vortex method, to study wind turbines and marine hydrokinetic (MHK) devices.
Fortran
17
star
52

sceptre-phenix

phenix is an orchestration tool and GUI for Sandia's minimega platform
JavaScript
17
star
53

Gulliver

Gulliver is a C# utility package and library engineered for the manipulation of arbitrary sized byte arrays accounting for appropriate endianness and jagged byte length.
C#
15
star
54

PyRIID

ML-based radioisotope identification and estimation from gamma spectra in Python.
Python
15
star
55

suncal

Suncal - Sandia PSL Uncertainty Calculator
Python
14
star
56

Fugu

Python
14
star
57

cambio

Spectroscopic file conversion tool
C
14
star
58

mesquite

Mesquite: Mesh Quality Improvement Toolkit
HTML
14
star
59

sandialabs.github.io

JavaScript
14
star
60

packet2vec

Word2Vec embeddings over packet capture data n-grams.
C++
14
star
61

parapint

Parallel solution of structured nonlinear optimization problems
Python
13
star
62

WecOptTool

WEC Design Optimization Toolbox
Python
13
star
63

Matrices

Contains the matrix generation software and normed matrices described in "Recreating Raven's: Software for systematically generating large numbers of Raven-like matrix problems with normed properties," published in Behavior Research Methods in 2010
13
star
64

SNL-Delft3D-CEC

Fortran
12
star
65

staged-script

A Python package enabling the development of robust automation scripts that are subdivided into stages.
Python
12
star
66

p3a

Portably Performant Physical Algebra
C++
12
star
67

LCM

Laboratory for Computational Mechanics
C++
12
star
68

gr-sandia_utils

Misc blocks
C++
12
star
69

RUBRIC

C++
12
star
70

parsegen-cpp

A C++17 library for parser generation for LALR(1) languages
C++
12
star
71

snl-pstess

The Power and Energy Storage Systems Toolbox
MATLAB
12
star
72

sibl

Sandia Injury Biomechanics Laboratory (SIBL)
Python
11
star
73

shadow

Shadow semi-supervised consistency regularization PyTorch library
Python
11
star
74

lim1tr

Lithium-Ion Modeling with 1-D Thermal Runaway (LIM1TR)
Python
11
star
75

StrideSearch

Storm detection in climate data
C++
11
star
76

SNL-SWAN

SNL-SWAN
Fortran
11
star
77

sceptre-bennu

Modeling and simulation of ICS devices
C++
11
star
78

barcs

Ballistic Asynchronous Reversible Computing with Superconductors -- Tool for functional element enumeration and classification.
Python
10
star
79

lapart-python

Python
10
star
80

TIGER

Target / Integrative Genetic Element Retriever: precisely maps IGEs (a defined type of genomic island) in bacterial and archaeal genomes; package also includes orthogonal program Islander
Perl
10
star
81

miniIsosurface

A mini-app to explore algorithms for generating contours from 3D volumes.
C++
10
star
82

chemical-recommender-system

Platform for rapid computation of molecular similarity and integration of custom machine learning models
Python
9
star
83

sdynpy

A Structural Dynamics Python Library
Python
9
star
84

quinn

Quantification of Uncertainties in Neural Networks
Python
9
star
85

Arcus

Arcus, developed by Sandia National Laboratories, is a C# library for calculating, parsing, formatting, converting and comparing both IPv4 and IPv6 addresses and subnets. It accounts for 128-bit numbers on 32-bit platforms.
C#
9
star
86

SpokeDartsPublic

SpokeDarts sphere-packing sampling in any dimension. Advancing front sampling from radial lines (spokes) through prior samples.
C++
9
star
87

linkshop

Python
8
star
88

capp

A simple, portable package manager for applications
CMake
8
star
89

pyperc

Python package to model invasion percolation
Python
8
star
90

Fenix

Fenix: A Portable, Flexible Fault Tolerance Programming Framework for MPI Applications
C
8
star
91

svp_1547.1

Test scripts for IEEE 1547.1
Python
8
star
92

pyNuMAD

pynumad is an object-oriented, open-source software program written in Python which simplifies the process of creating a three-dimensional model of a wind turbine blade.
Python
8
star
93

rattlesnake-vibration-controller

Vibration Controller targetting Multiple-Input-Multiple-Output (MIMO) and Combined Environments Control
Python
8
star
94

mcdn-3d-seg

Monte Carlo Dropout Network for 3D Image Segmentation
Python
8
star
95

phoenix

An astrodynamics library.
Scala
7
star
96

xyz

zip metadata extraction tool
Python
7
star
97

CSPlib

Computational singular perturbation analysis library
Jupyter Notebook
7
star
98

BioCompoundML

BioCompoundML is a software tool for rapidly screening chemicals by chemical properties, using machine learning.
Python
7
star
99

miniGraphics

Miniapp to demonstrate parallel rendering in an MPI environment using a sort-last parallel rendering approach.
C++
7
star
100

PRIME

PRIME is a modeling framework designed for the "real-time" characterization and forecasting of partially observed epidemics
Python
7
star