• Stars
    star
    39
  • Rank 693,563 (Top 14 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Language-Agnostic Website Embedding and Classification

More Repositories

1

aiflows

🤖🌊 aiFlows: The building blocks of your collaborative AI
Jupyter Notebook
235
star
2

GoogleTrendsAnchorBank

Google Trends, made easy.
HTML
101
star
3

GenIE

The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.
Python
98
star
4

transformers-CFG

🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers
Python
62
star
5

SynthIE

The data and the PyTorch implementation for the models and experiments in the paper "Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction".
Python
57
star
6

llm-latent-language

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
Jupyter Notebook
46
star
7

cc_flows

The data and implementation for the experiments in the paper "Flows: Building Blocks of Reasoning and Collaborating AI".
Python
31
star
8

Cr5

Code and data for the WSDM '19 paper "Crosslingual Document Embedding as Reduced-Rank Ridge Regression (Cr5)"
Jupyter Notebook
30
star
9

GPTurk

Jupyter Notebook
29
star
10

quootstrap

Unsupervised method for extracting quotation-speaker pairs from large news corpora.
Java
27
star
11

GCD

Python
27
star
12

YouNiverse

Code for the dataset paper: "YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking YouTube"
Jupyter Notebook
23
star
13

Quotebank

Code and data for the WSDM '21 paper "Quotebank: A Corpus of Quotations from a Decade of News"
Java
17
star
14

WikiHist.html

This is a repo containing all code and steps taken to download, setup the process and convert the whole English Wikipedia history from Wikitext to HTML format.
PHP
14
star
15

understanding-decoding

The data and the PyTorch implementation for the models and experiments in the paper "Language Model Decoding as Likelihood–Utility Alignment".
Python
13
star
16

LAMEN

Python
12
star
17

entity-matchers

Source code for "A Critical Re-evaluation of Neural Methods for Entity Alignment"
Python
12
star
18

pairformance

Tool to perform paired evaluation of automatic systems
Python
11
star
19

WikiPDA

Crosslingual Topic Modeling with WikiPDA
Jupyter Notebook
10
star
20

unfun

Code and data for the AAAI'19 paper "Reverse-Engineering Satire, or 'Paper on Computational Humor Accepted Despite Making Serious Advances'"
HTML
10
star
21

invariant-language-models

A framework to train language models to learn invariant representations.
Python
8
star
22

eigenthemes

Source code for "Low-rank Subspaces for Unsupervised Entity Linking"
Python
8
star
23

causal-distances

Jupyter Notebook
8
star
24

property-inference-attacks

Modular framework for property inference attacks on deep neural networks
Python
6
star
25

GraphCyclesRemoval

Implementation of "A fast and effective heuristic for the feedback arc set problem"
Java
5
star
26

secvm-server

The server to collect user data and learn an SVM in the SecVM project
Java
4
star
27

KLearn

Jupyter Notebook
4
star
28

BT-eval

Code to reproduce experiments of the ACL 2021 publication on the evaluation of NLP systems with the BT mechanism
Jupyter Notebook
4
star
29

Negativity_in_2016_campaign

Code for the Paper "United States Politicians' Tone Became More Negative with 2016 Primary Campaigns"
TeX
4
star
30

flows

The flows library
Python
3
star
31

amplification_paradox

This repo contains the simulation code for the paper "The Amplification Paradox in Recommender Systems"
Jupyter Notebook
3
star
32

llm-grounding-analysis

Python
3
star
33

distribution-inference-risks

Distribution Inference Risks: Identifying and Mitigating Sources of Leakage
Jupyter Notebook
3
star
34

wiki_pageviews_covid

Data and code for the paper "Sudden Attention Shifts on Wikipedia During the COVID-19 Crisis"
Jupyter Notebook
3
star
35

descartes

The PyTorch implementation for the models in the paper "Descartes: Generating Short Descriptions of Wikipedia Articles"
Python
2
star
36

nelight

Python
2
star
37

laughing-head

Code for the laughing head paper
Jupyter Notebook
2
star
38

mdic

Code and data for the paper: "Message Distortion in Information Cascades" (TheWebConf2019)
Jupyter Notebook
2
star
39

wiki_image_classification

Wikipedia Image Classification project
Jupyter Notebook
2
star
40

youtube-embeddings

YouTube channel embeddings and social dimensions
HTML
1
star
41

140_to_280

Repository for the paper “How Constraints Affect Content: The Case of Twitter's Switch from 140 to 280” published at ICWSM’18
1
star
42

WikipediaAsWebGateway

Jupyter Notebook
1
star
43

CELMOC

Framework for Cost-Effective Language Model Choice
Python
1
star
44

quotebank-toolkit

Scripts for cleaning and enriching Quotebank
Python
1
star
45

post-mortem-memory

HTML
1
star
46

broccoli-plugin

Python
1
star
47

structuring-wikipedia-articles

Structuring Wikipedia Articles with Section Recommendations
1
star
48

foodle-trends

Jupyter Notebook
1
star
49

when_sheep_shop

Repository for the article "When Sheep Shop: Measuring Herding Effects in Product Ratings with Natural Experiments" published at WWW2018
Jupyter Notebook
1
star
50

manosphere_to_altright

Jupyter Notebook
1
star
51

DIPPS

Python
1
star
52

anticipated-vs-actual

Jupyter Notebook
1
star
53

deplatforming_dataset

Jupyter Notebook
1
star
54

SpokespersonAttributionCOVID

Repository of code and data for the paper "The effect of spokesperson attribution on public health message sharing during the COVID-19 pandemic".
R
1
star
55

WCNPruning

A framework to clean the Wikipedia category network.
Java
1
star
56

wikipedia-citation-engagement

Quantifying Engagement with Citations on Wikipedia https://arxiv.org/abs/2001.08614
Jupyter Notebook
1
star