• Stars
    star
    100
  • Rank 340,703 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Box Embeddings as Modules

A fully open source Python library for geometric representation learning, compatible with both PyTorch and TensorFlow, which allows existing neural network layers to be replaced with or transformed into boxes easily.

Tests Typing/Doc/Style Binder codecov


🌟 Features

  • Modular and reusable library that aids the researchers in studying probabilistic box embeddings.
  • Extensive documentation and example code, demonstrating the use of the library to make it easy to adapt to existing codebases.
  • Rigorously unit-test the codebase with high coverage, ensuring an additional layer of reliability.
  • Customizable pipelines
  • Actively being maintained by IESL at UMass

💻 Installation

Installing via pip

The preferred way to install Box Embeddings for regular usage, test, or integration into the existing workflow is via pip. Just run

pip install box-embeddings

Installing from source

You can also install Box Embeddings by cloning our git repository

git clone https://github.com/iesl/box-embeddings

Create a Python 3.7 or 3.8 virtual environment under the project directory and install the Box Embeddings package in editable mode by running:

virtualenv box_venv
source box_venv/bin/activate
pip install --editable . --user
pip install -r core_requirements.txt

👟 Quick start

After installing Box Embeddings, a box can be initialized from a tensor as follows:

import torch
from box_embeddings.parameterizations.box_tensor import BoxTensor
data_x = torch.tensor([[1,2],[-1,5]])
box_x = BoxTensor(data_x)
box_x

The result box_x is now a BoxTensor object. To view other examples, visit the examples section.

BoxTensor(tensor([[ 1,  2],
        [-1,  5]]))

📖 Command Overview

Command Description
box_embeddings An open-source library for NLP or graph learning
box_embeddings.common Utility modules that are used across the library
box_embeddings.initializations Initialization modules
box_embeddings.modules A collection of modules to operate on boxes
box_embeddings.parameterizations A collection of modules to parameterize boxes

📍 Navigating the codebase

Task Where to go
Contribution manual Link
Source codes Link
Usage documentation Link
Training examples Link
Unit tests Link

📚 Reference

  1. If you use this library in you work, please cite the following arXiv version of the paper
@article{chheda2021box,
  title={Box Embeddings: An open-source library for representation learning using geometric structures},
  author={Chheda, Tejas and Goyal, Purujit and Tran, Trang and Patel, Dhruvesh and Boratko, Michael
  and Dasgupta, Shib Sankar and McCallum, Andrew},
  journal={arXiv preprint arXiv:2109.04997},
  year={2021}
}
  1. If you use simple hard boxes with surrogate loss then cite the following paper:
@inproceedings{vilnis2018probabilistic,
  title={Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures},
  author={Vilnis, Luke and Li, Xiang and Murty, Shikhar and McCallum, Andrew},
  booktitle={Proceedings of the 56th Annual Meeting of the Association for
  Computational Linguistics (Volume 1: Long Papers)},
  pages={263--272},
  year={2018}
}
  1. If you use softboxes without any regularizaton the cite the following paper:
@inproceedings{li2018smoothing,
title={Smoothing the Geometry of Probabilistic Box Embeddings},
author={Xiang Li and Luke Vilnis and Dongxu Zhang and Michael Boratko and Andrew McCallum},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=H1xSNiRcF7},
}
  1. If you use softboxes with regularizations defined in the Regularizations module then cite the following paper:
@inproceedings{patel2020representing,
title={Representing Joint Hierarchies with Box Embeddings},
author={Dhruvesh Patel and Shib Sankar Dasgupta and Michael Boratko and Xiang Li and Luke Vilnis
and Andrew McCallum},
booktitle={Automated Knowledge Base Construction},
year={2020},
url={https://openreview.net/forum?id=J246NSqR_l}
}
  1. If you use Gumbel box then cite the following paper:
@article{dasgupta2020improving,
  title={Improving Local Identifiability in Probabilistic Box Embeddings},
  author={Dasgupta, Shib Sankar and Boratko, Michael and Zhang, Dongxu and Vilnis, Luke
  and Li, Xiang Lorraine and McCallum, Andrew},
  journal={arXiv preprint arXiv:2010.04831},
  year={2020}
}

💪 Contributors

We welcome all contributions from the community to make Box Embeddings a better package. If you're a first time contributor, we recommend you start by reading our CONTRIBUTING.md guide.

💡 News and Updates

Our library Box Embeddings will be officially introduced at EMNLP 2021!

🤗 Acknowledgments

Box Embeddings is an open-source project developed by the research team from the Information Extraction and Synthesis Laboratory at the College of Information and Computer Sciences (UMass Amherst).

More Repositories

1

dilated-cnn-ner

Dilated CNNs for NER in TensorFlow
Python
244
star
2

diora

Deep Inside-Outside Recursive Autoencoder
Python
87
star
3

xcluster

Algorithms and evaluation tools for extreme clustering
Scala
66
star
4

TypeNet

A Hierarchical Type system for fine grained entity typing
Python
51
star
5

metanlp

Meta-learning for NLP
Python
45
star
6

learned-string-alignments

Learning String Alignments for Entity Aliases
Python
38
star
7

stance

Learned string similarity for entity names using optimal transport.
Python
34
star
8

word2box

Capturing Set-Theoretic Semantics of Words using Box Embeddings
Python
33
star
9

watr-works

Scala
33
star
10

protoqa-data

Dataset for protoqa ("family feud") data
30
star
11

leopard

24
star
12

grinch

Scalable Hierarchical Clustering with Tree Grafting
Python
23
star
13

interactive_LM

Python
20
star
14

CSFCube

A Test Collection of Computer Science Papers for Faceted Query by Example
Python
18
star
15

inventor-disambiguation

Scala
16
star
16

Distributional-Inclusion-Vector-Embedding

Jupyter Notebook
15
star
17

geometric-graph-embedding

Python
12
star
18

conll2012-preprocess-parsing

Scripts for pre-processing the CoNLL-2012 dataset for syntactic dependency parsing.
Shell
12
star
19

fair-matching

Fair paper matching
Python
11
star
20

CE2ERE

Constrained learning using boxes for event-event relation extraction
Jupyter Notebook
11
star
21

gumbel-box-embeddings

Jupyter Notebook
11
star
22

box-mlc-iclr-2022

Official repository for the paper "Modeling Label Space Interactions in Multi-label Classification using Box Embeddings".
Jupyter Notebook
11
star
23

s-diora

Python
10
star
24

expLinkage

Supervised hierarchical clustering
Python
9
star
25

Softmax-CPR

Better output softmax alternatives for natural language generation
Python
8
star
26

anncur

Approximate Nearest Neighbor search using CUR Decomposition
Python
8
star
27

ProtoQA_GPT2

This is the GPT2 baseline for ProtoQA
Python
8
star
28

softmax_CPR_recommend

The code repository for "To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders"
Python
7
star
29

rexa1-metatagger

Java
5
star
30

author_coref

Author Disambiguation
Scala
5
star
31

knnlm-retrieval-quality

Python
4
star
32

protoqa-evaluator

Evaluation functions for ProtoQA dataset
Python
4
star
33

rexa1-pstotext

C
3
star
34

paper-header

Scala
3
star
35

institution_hierarchies

Python
3
star
36

iesl-sbt-base

SBT plugin providing lots of boilerplate dependencies, IESL repos, etc., to provide simple and consistent configuration of IESL SBT projects.
Scala
3
star
37

bibie

Research paper header and references field extraction
Scala
3
star
38

pdf2meta

Scala
3
star
39

namejuggler

Parsing, rearranging, and compatibility testing of person names (mostly in Western cultures). The problem is in general unsolvable due to cultural ambiguities, so we just make a simple heuristic attempt.
Scala
3
star
40

Boxes_for_Joint_hierarchy_AKBC_2020

Python
2
star
41

seal-neurips-2022

🦭 This is the official implementation for the paper [Structured Energy Network As a Loss](https://openreview.net/pdf?id=F0DowhX7_x).
Python
2
star
42

structured_prediction_baselines

Structure Prediction Baselines Using AllenNLP. Implements baselines for tasks like POS tagging, NER and SRL.
Jsonnet
2
star
43

rpp

Research Paper Processor
Scala
2
star
44

distantly-supervised-diora

Python
2
star
45

Multi_facet_recommendation

Jupyter Notebook
1
star
46

paper_coref

Paper/Citation Coreference
Scala
1
star
47

neural_relation_extraction

Python
1
star
48

bibmogrify

High-volume format translation and processing of scholarly citations and patents.
Scala
1
star
49

fuse_ttl

Technical Term List Generation for Fuse
Scala
1
star
50

paper-header-annotator-2

a paper header annotation tool based on Fabric.js and Play Framework
JavaScript
1
star
51

paper-header-annotator

An tool for annotating the headers of academic papers
JavaScript
1
star
52

score-paper-segmentation

A small utility which uses Jaccard Similarity of Bigrams to measure coarse paper segmentation
Python
1
star
53

iesl-pdf-to-text

JavaScript
1
star