• Stars
    star
    138
  • Rank 258,885 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Causal discovery algorithms and tools for implementing new ones

Causality Lab

This repository contains research code of novel causal discovery algorithms developed at Intel Labs, as well as other common algorithms, and classes for developing and examining new algorithms for causal structure learning.

Update (December 2023): CLEANN is novel algorithm presented at NeurIPS 2023. It generates causal explanations for the outcomes of existing pre-trained Transformer neural networks. At its core, it is based on the novel causal interpretation of self-attention presented in the paper, and executes attention-based causal-discovery (ABCD). This notebook demonstrates, using a simple example, how to use CLEANN.

Table of Contents

Algorithms and Baselines

Included algorithms learn causal structures from observational data, and reason using these learned causal graphs. There are three families of algorithms:

  1. Causal discovery under causal sufficiency and bayesian network structure learning

    1. PC algorithm (Spirtes et al., 2000)
    2. RAI algorithm, Recursive Autonomy Identification (Yehezkel and Lerner, 2009). This algorithm is used for learning the structure in the B2N algorithm (Rohekar et al., NeurIPS 2018b)
    3. B-RAI algorithm, Bootstrap/Bayesian-RAI for uncertainty estimation (Rohekar et al., NeurIPS 2018a). This algorithm is used for learning the structure of BRAINet (Rohekar et al., NeurIPS 2019)
  2. Causal discovery in the presence of latent confounders and selection bias

    1. FCI algorithm, Fast Causal Inference (Spirtes et at., 2000)
    2. ICD algorithm, Iterative Causal Discovery (Rohekar et al., NeurIPS 2021)
    3. TS-ICD algorithm, ICD for time-series data (Rohekar et al., ICML 2023)
  3. Causal reasoning

    1. CLEANN algorithm, Causal Explanation from Attention in Neural Networks (Rohekar et al., 2023, Nisimov et al., 2022).

Example ICD

Developing and Examining Algorithms

This repository includes several classes and methods for implementing new algorithms and testing them. These can be grouped into three categories:

  1. Simulation:
    1. Random DAG sampling
    2. Observational data sampling
  2. Causal structure learning:
    1. Classes for handling graphical models (e.g., methods for graph traversal and calculating graph properties). Supported graph types:
      1. Directed acyclic graph (DAG): commonly used for representing causal DAGs
      2. Partially directed graph (PDAG/CPDAG): a Markov equivalence class of DAGs under causal sufficiency
      3. Undirected graph (UG) usually used for representing adjacency in the graph (skeleton)
      4. Ancestral graph (PAG/MAG): a MAG is an equivalence class of DAGs, and a PAG is an equivalence class of MAGs (Richardson and Spirtes, 2002).
    2. Statistical tests (CI tests) operating on data and a perfect CI oracle (see causal discovery with a perfect oracle)
  3. Performance evaluations:
    1. Graph structural accuracy
      1. Skeleton accuracy: FNR, FPR, structural Hamming distance
      2. Orientation accuracy
      3. Overall graph accuracy: BDeu score
    2. Computational cost: Counters for CI tests (internal caching ensures counting once each a unique test)
    3. Plots for DAGs and ancestral graphs.

A new algorithm can be developed by inheriting classes of existing algorithms (e.g., B-RAI inherits RAI) or by creating a new class. The only method required to be implemented is learn_structure(). For conditional independence testing, we implemented conditional mutual information, partial correlation statistical test, and d-separation (perfect oracle). Additionally, a Bayesian score (BDeu) can be used for evaluating the posterior probability of DAGs given data.

Block Diagram

Installation

This code has been tested on Ubuntu 18.04 LTS and macOS Catalina, with Python 3.5. We recommend installing and running it in a virtualenv.

sudo -E pip3 install virtualenv
virtualenv -p python3 causal_env
. causal_env/bin/activate

git clone https://github.com/IntelLabs/causality-lab.git
cd causality-lab
pip install -r requirements.txt

Usage Example

Learning a Casual Structure from Observed Data

All causal structure learning algorithms are classes with a learn_structure() method that learns the causal graph. The learned causal graph is a public class member, simply called graph, which is an instance of a graph class. The structure learning algorithms does not have direct access to the data, instead they call a statistical test which accesses the data.

Let's look at the following example: causal structure learning with ICD using a given dataset.

par_corr_test = CondIndepParCorr(dataset, threshold=0.01)  # CI test with the given significance level
icd = LearnStructICD(nodes_set, par_corr_test)  # instantiate an ICD learner
icd.learn_structure()  # learn the causal graph

For complete examples, see causal discovery with latent confounders and causal discovery under causal sufficiency notebooks. The learned structures can then be plotted - see a complete example for creating a PAG, calculating its properties, and plotting it in the partial ancestral graphs notebook.

PAG plot example

References

  • Rohekar, Raanan, Yaniv Gurwicz, and Shami Nisimov. "Causal Interpretation of Self-Attention in Pre-Trained Transformers". Advances in Neural Information Processing Systems (NeurIPS) 36, 2023.
  • Rohekar, Raanan Y., Shami Nisimov, Yaniv Gurwicz, and Gal Novik. "From Temporal to Contemporaneous Iterative Causal Discovery in the Presence of Latent Confounders" International Conference on Machine Learning (ICML), 2023.
  • Nisimov, Shami, Raanan Y. Rohekar, Yaniv Gurwicz, Guy Koren, and Gal Novik. "CLEAR: Causal Explanations from Attention in Neural Recommenders". Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems (CONSEQUENCES) workshop at RecSys, 2022.
  • Rohekar, Raanan Y., Shami Nisimov, Yaniv Gurwicz, and Gal Novik. "Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias" Advances in Neural Information Processing Systems (NeurIPS) 34, 2021.
  • Rohekar, Raanan Y., Yaniv Gurwicz, Shami Nisimov, and Gal Novik. "Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections". Advances in Neural Information Processing Systems (NeurIPS) 32: 4244-4254, 2019.
  • Rohekar, Raanan Y., Yaniv Gurwicz, Shami Nisimov, Guy Koren, and Gal Novik. "Bayesian Structure Learning by Recursive Bootstrap." Advances in Neural Information Processing Systems (NeurIPS) 31: 10525-10535, 2018a.
  • Rohekar, Raanan Y., Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. "Constructing Deep Neural Networks by Bayesian Network Structure Learning". Advances in Neural Information Processing Systems (NeurIPS) 31: 3047-3058, 2018b.
  • Yehezkel, Raanan, and Boaz Lerner. "Bayesian Network Structure Learning by Recursive Autonomy Identification". Journal of Machine Learning Research (JMLR) 10, no. 7, 2009
  • Richardson, Thomas, and Peter Spirtes. "Ancestral graph Markov models". The Annals of Statistics, 30 (4): 962–1030, 2002.
  • Spirtes Peter, Clark N. Glymour, Richard Scheines, and David Heckerman. "Causation, prediction, and search". MIT press, 2000.

More Repositories

1

distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Jupyter Notebook
4,312
star
2

nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Python
2,930
star
3

coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Python
2,312
star
4

control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data
C++
1,236
star
5

fastRAG

Efficient Retrieval Augmentation and Generation Framework
Python
955
star
6

flrc

Haskell Research Compiler
Standard ML
815
star
7

RiverTrail

An API for data parallelism in JavaScript
JavaScript
748
star
8

kAFL

A fuzzer for full VM kernel/driver targets
Makefile
609
star
9

bayesian-torch

A library for Bayesian neural network layers and uncertainty estimation in Deep Learning extending the core of PyTorch
Python
467
star
10

academic-budget-bert

Repository containing code for "How to Train BERT with an Academic Budget" paper
Python
303
star
11

ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
Julia
294
star
12

SkimCaffe

Caffe for Sparse Convolutional Neural Network
C++
237
star
13

pWord2Vec

Parallelizing word2vec in shared and distributed memory
C++
190
star
14

Model-Compression-Research-Package

A library for researching neural networks compression and acceleration methods.
Python
133
star
15

matsciml

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
Python
127
star
16

HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
Julia
121
star
17

rnnlm

Recurrent Neural Network Language Modeling (RNNLM) Toolkit
C++
121
star
18

IntelNeuromorphicDNSChallenge

Intel Neuromorphic DNS Challenge
Jupyter Notebook
119
star
19

riscv-vector

Vector Acceleration IP core for RISC-V*
Scala
97
star
20

MMPano

Official implementation of L-MAGIC
Python
91
star
21

ScalableVectorSearch

C++
88
star
22

FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
Python
81
star
23

SpMP

sparse matrix pre-processing library
C++
81
star
24

VL-InterpreT

Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers
Python
80
star
25

vdms

VDMS: Your Favourite Visual Data Management System
C++
78
star
26

SLIDE_opt_ia

C++
74
star
27

CLNeRF

Python
61
star
28

baa-ngp

This repository contains the official Implementation for "BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives".
Python
57
star
29

autonomousmavs

Framework for Autonomous Navigation of Micro Aerial Vehicles
C++
55
star
30

Latte.jl

A high-performance DSL for deep neural networks in Julia
Julia
52
star
31

AVUC

Code to accompany the paper 'Improving model calibration with accuracy versus uncertainty optimization'.
Python
48
star
32

pmgd

Persistent Memory Graph Database
C++
43
star
33

multimodal_cognitive_ai

research work on multimodal cognitive ai
Python
43
star
34

GraVi-T

Graph learning framework for long-term video understanding
Python
43
star
35

PreSiFuzz

Pre-Silicon Hardware Fuzzing Toolkit
Rust
42
star
36

TSAD-Evaluator

Intel Labs open source repository for time series anomaly detection evaluator
C++
41
star
37

Open-Omics-Acceleration-Framework

Intel lab's open sourced data science framework for accelerating digital biology
Jupyter Notebook
35
star
38

Auto-Steer

Auto-Steer
Python
33
star
39

SAR

Python
32
star
40

kafl.fuzzer

kAFL Fuzzer
Python
31
star
41

TinyGarble2.0

C++
30
star
42

CompilerTools.jl

The CompilerTools package, part of the High Performance Scripting project at Intel Labs
Julia
30
star
43

t2sp

Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)
C++
29
star
44

ParallelJavaScript

A collection of example workloads for Parallel JavaScript
HTML
26
star
45

continuallearning

Python
25
star
46

iHRC

Intel Heterogeneous Research Compiler (iHRC)
C++
25
star
47

kafl.targets

Target components for kAFL/Nyx Fuzzer
C
23
star
48

DyNAS-T

Dynamic Neural Architecture Search Toolkit
Jupyter Notebook
23
star
49

flrc-lib

Pillar compiler, Pillar runtime, garbage collector.
C++
23
star
50

iACT

C++
22
star
51

OSCAR

Object Sensing and Cognition for Adversarial Robustness
Jupyter Notebook
21
star
52

mat2qubit

Python
19
star
53

MICSAS

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
Python
19
star
54

csg

IV 2020 "CSG: Critical Scenario Generation from Real Traffic Accidents"
Python
18
star
55

Sparso

Julia package for accelerating sparse matrix applications.
Julia
18
star
56

MART

Modular Adversarial Robustness Toolkit
Python
16
star
57

Trans-Omics-Acceleration-Library

HTML
15
star
58

open-omics-alphafold

Python
13
star
59

kafl.linux

Linux kernel branches for confidential compute research
12
star
60

c3-simulator

C3-Simulator is a Simics-based functional simulator for the X86 C3 processor, including library and kernel support for pointer and data encryption, stack unwinding support for C++ exception handling, debugger enabling, and scripting for running tests.
C++
12
star
61

Hardware-Aware-Automated-Machine-Learning

Jupyter Notebook
11
star
62

flrc-benchmarks

Benchmarks for use with IntelLabs/flrc.
Haskell
10
star
63

ais-benchmarks

A framework, based on python and numpy, for evaluation of sampling methods
Python
9
star
64

ALTO

A template-based implementation of the Adaptive Linearized Tensor Order (ALTO) format for storing and processing sparse tensors.
C++
9
star
65

VectorSearchDatasets

Python
9
star
66

RiverTrail-interactive

An interactive shell in your browser for writing and running River Trail programs
JavaScript
8
star
67

gma

Linux Client & Server Software to support Generic Multi-Access Network Virtualization
C++
8
star
68

scenario_execution

Scenario Execution for Robotics
Python
8
star
69

dfm

DFM (Deep Feature Modeling) is an efficient and principled method for out-of-distribution detection, novelty and anomaly detection.
Python
7
star
70

SOI_FFT

Segment-of-interest low-communication FFT algorithm
C
7
star
71

DATSA

DATSA
C++
6
star
72

Hybrid-Quantum-Classical-Library

Hybrid Quantum-Classical Library (HQCL)
C++
6
star
73

spic

Semantic Preserving Image Compression
Python
6
star
74

PyTorchALFI

Application Level Fault Injection for Pytorch
Python
6
star
75

generative-ai

Intel Generative Image Model Benchmark
Jupyter Notebook
6
star
76

vcl

DEPRECATED - No longer maintained. Updates are will be provided through the VDMS project
C++
5
star
77

NeuroCounterfactuals

Jupyter Notebook
5
star
78

c3-glibc

C
5
star
79

Latte.py

Python
5
star
80

PolarFly

Source code repository for paper being presented at Super Computing 22 Conference.
C++
5
star
81

aspect-extraction

Pattern Based Aspect Term Extraction
Python
5
star
82

Optimized-Implementation-of-Word-Movers-Distance

C++
5
star
83

token_elimination

Python
5
star
84

HDFIT

HDFIT (Hardware Design Fault Injection Toolkit) Github documentation pages.
5
star
85

Incremental-Neural-Videos-with-PyTorch

Incremental-Neural-Videos-with-PyTorch*
Python
4
star
86

LogReplicationRocksDB

C++
4
star
87

emp-ot

C++
3
star
88

networkgym

NetworkGym is a Simulation-aaS framework to support Network AI algorithm development by providing high-fidelity full-stack e2e network simulation in cloud and allowing AI developers to interact with the simulated network environment through open APIs.
C++
3
star
89

emp-tool

C++
3
star
90

approximate-bayesian-inference

Python
3
star
91

simics-plus-rtl

This project contains the Chisel code for a CRC32 datapath alongside a skeleton PCI component in Simics DML which connects to the C++ conversion of the CRC32 datapath.
Scala
3
star
92

mlwins

Machine Learning for Wireless Networking Systems Simulator
Jupyter Notebook
2
star
93

kafl.edk2

EDK2 / TDVF branches for kAFL fuzzing research (experimental - do not use!)
2
star
94

kafl.libxdc

C
2
star
95

aqtnd

Automated quantum tensor network design
Jupyter Notebook
2
star
96

c3-perf-simulator

C++
2
star
97

LLMLNCL

C++
2
star
98

kafl.actions

Github actions for KAFL
Python
2
star
99

c3-linux

C
2
star
100

kafl.qemu

2
star