• Stars
    star
    399
  • Rank 107,458 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated 20 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data and software for building the ACL Anthology.

ACL Anthology

These are basic instructions on generating the ACL Anthology website as seen on https://aclanthology.org/. The official home of this repository is https://github.com/acl-org/acl-anthology.

Generating the Anthology

Prerequisites

To build the Anthology website, you will need:

  • Python 3.8 or higher
  • Python packages listed in bin/requirements.txt; to install, run pip -r bin/requirements.txt
  • Hugo 0.58.3 or higher (can be downloaded directly from their repo; the extended version is required!)
  • bibutils for creating non-BibTeX citation formats (not strictly required to build the website, but without them you need to invoke the build steps manually as laid out in the detailed README)
  • optional: If you install libyaml-dev and Cython before running make the first time, the libyaml C library will be used instead of a python implementation, speeding up the build.

Building and deployment with GitHub

There is a GitHub actions action performing deployment directly from GitHub. To use this, you need to define this variable in your repository settings (web interface: settings -> secrets):

  • PUBLISH_SSH_KEY: the secret key in standard pem format for authentication (without a passphrase)

GitHub will then automatically build and deploy the current master whenever the master branch changes. This is done via the upload target in the Makefile.

Cloning

Clone the Anthology repo to your local machine:

$ git clone https://github.com/acl-org/acl-anthology

Generating

Provided you have correctly installed all requirements, building the website should be as simple running make from the directory to which you cloned the repo.

The fully generated website will be in build/anthology afterwards. If any errors occur during this step, you can consult the detailed README for more information on the individual steps performed to build the site. You can see the resulting website by launching a local webserver with make serve, which will serve it at http://localhost:8000.

Note that building the website is quite a resource-intensive process; particularly the last step, invoking Hugo, uses about 18~GB of system memory. Building the anthology takes about 10 minutes on a laptop with an SSD.

(Note: This does not mean you need this amount of RAM in your system; in fact, the website builds fine on a laptop with 8 GB of RAM. The system might temporarily slow down due to swapping, however. The figure of approx. 18 GB is the maximum RAM usage reported when running hugo --minify --stepAnalysis.)

The anthology can be viewed locally by running hugo server in the hugo/ directory. Note that it rebuilds the site and therefore takes about a minute to start.

Hosting a mirror of the ACL anthology

First, creating a mirror is slow and stresses the ACL anthology infrastructure because on initial setup you have to download every single file of the anthology from the official webserver. This can take up to 8 hours no matter how fast your connection is. So please don't play around with this just for fun.

If you want to host a mirror, you have to set two environment variables:

  • ANTHOLOGY_PREFIX the http prefix your mirror will be reachable under e.g. https://example.com/my-awesome-mirror or http://aclanthology.lst.uni-saarland.de (Notice that there is no slash at the end!)
  • ANTHOLOGYFILES the directory under which papers, attachments etc. will reside on your webserver. This directory needs to be readable by your webserver (obviously) but should not be a subdirectory of the anthology mirror directory.

With these variables set, you run make to create the pages and make mirror to mirror all additional files into the build/anthology-files directory. If you created a mirror before already, it will only download the missing files.

If you want to mirror the papers but not all attachments, you can run make mirror-no-attachments instead.

You then rsync the build/website/ directory to your webserver or, if you serve the mirror in a subdirectory FOO, you mirror build/website/FOO. The build/anthology-files directory needs to be rsync-ed to the ANTHOLOGYFILES directory of your webserver.

As you probably want to keep the mirror up to date, you can modify the shell script bin/acl-mirror-cronjob.sh to your needs.

You will need this software on the server

  • rsync
  • git
  • python3
  • hugo > 0.58
  • python3-venv

If you want the build process to be fast, install cython3 and libyaml-dev (see above).

Note that generating the anthology takes quite a bit of RAM, so make sure it is available on your machine.

Contributing

If you'd like to contribute to the ACL Anthology, please take a look at:

History

This repo was originally wing-nus/acl and has been transferred over to acl-org as of 5 June 2017.

License

The code for building the ACL Anthology is distributed under the Apache License, v2.0.

More Repositories

1

acl-style-files

Official style files for papers submitted to venues of the Association for Computational Linguistics
TeX
641
star
2

aclpubcheck

Tools for checking ACL paper submissions
Python
553
star
3

ACLPUB

The official tool for creating proceedings for conferences of the Association for Computational Linguistics (ACL).
Perl
217
star
4

responsibleNLPresearch

templates and other documents regarding responsible NLP research
TeX
63
star
5

ethics-reading-list

A list of ethics related resources for researchers and practitioners of Natural Language Processing and Computational Linguistics
28
star
6

reviewer-paper-matching

Reviewer-paper matching for ACL
Python
24
star
7

arr-health

Monitoring the health of ARR
HTML
17
star
8

emnlp-2023

Repository containing the website for the EMNLP 2023 conference
HTML
16
star
9

acl-2023

Repository for the ACL 2023 conference website
JavaScript
11
star
10

ethics-tutorial

Materials for the EACL 2023 Ethic Tutorial: Understanding Ethics in NLP Authoring and Reviewing
8
star
11

emnlp-2024

Repository for the EMNLP 2024 conference
HTML
6
star
12

acl-2023-miniconf

JavaScript
5
star
13

emnlp-2022

HTML
4
star
14

eacl-2024

JavaScript
4
star
15

acl-2024

Repository for the ACL 2024 conference website
JavaScript
3
star
16

acl-2020-virtual-conference-tools

Python / Bash tools for ACL2020 virtual conference
Python
3
star
17

eacl-2023

Repository for the EACL 2023 conference website
JavaScript
3
star
18

anthology-docker

Official source for Docker configurations, images, and examples of Dockerfiles for the ACL Anthology and projects
Shell
2
star
19

emnlp-2023-miniconf

JavaScript
2
star
20

emnlp-2020-virtual-conference-images

Makefile
1
star
21

eacl-2021

EACL 2021 website
JavaScript
1
star
22

conference-handbook

Repository containing the official ACL conference handbook
1
star