• Stars
    star
    122
  • Rank 292,031 (Top 6 %)
  • Language
    TypeScript
  • License
    Other
  • Created about 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Effortlessly and transparently add correctly styled citations to your markdown paper given only a URL
title author date link-citations urlcolor url2cite
Automatic citation extraction from URLs
phiresky
2019-12-13
true
blue
all-links

pandoc-url2cite allows you to instantly and transparently cite most papers directly given only a single URL.

You simply add a URL of a publication, and it will replace that with a real citation in whatever CSL style you want. This means you can avoid dealing with Mendeley or Zotero and keeping your Reference Manager database and bibtex file in sync, especially when collaborating with others.

Minimal Example

Here is a minimal example:

minimal.md

# Introduction

The GAN was first introduced in [@gan].

# References

[@gan]: https://papers.nips.cc/paper/5423-generative-adversarial-nets

Compiling this file with this command

pandoc \
    --filter=pandoc-url2cite --citeproc \
    --csl ieee-with-url.csl \
    minimal.md -o minimal.pdf

This results in the following output:
minimal.pdf

For a longer example, you can look at the source of this file itself, which is both a blog post, GitHub Readme and LaTeX "paper":

README.pdf

Source README.md - Result README.pdf

How to Use

Install this package globally using npm install -g pandoc-url2cite.

Then, add --filter=pandoc-url2cite to your pandoc command (before --citeproc, see the minimal example above).

Alternatively, clone this repo somewhere, then install the dependencies using npm ci install.

If you're not familiar with writing papers in pandoc, you can refer to e.g. this article. It's pretty flexible, you can use templates from whatever conference you want, and you can still use inline latex code if you need it (and you are ok with not being able to convert your document to nice HTML or EPUB anymore).

Citation Syntax

url2cite allows multiple ways to cite:

  1. (PREFERRED) Use the pandoc citation syntax for citations:

    The authors of [@alexnet] first introduced CNNs to the ImageNet challenge.

    More information about referencing specific pages etc. is in the pandoc manual.

    Then add the URLs with the usual "link reference" syntax to the bottom of your document in its own paragraph:

    [@alexnet]: https://...

    You can also use the URL directly inline by using the flexible citation syntax introduced in Pandoc 2.14:

    PPO [@{https://github.com/jgm/pandoc/issues/6026}] is a policy gradient method.

  2. Convert all links to citations

    Add url2cite: all-links to your yaml front matter. This will cause all links in the document to be converted to references.

    You can still blacklist some links by adding no-url2cite to either the CSS class of the link (pandoc-only):

    [foo](http://example.com){.no-url2cite}

    or to the link title:

    [foo](http://example.com "no-url2cite").

How it Works

The main idea is that usually every piece of research you might want to cite is fully identifiable by an URL - no need to manually enter metadata like author, release date, journal, etc. Citation managers like Zotero already use this and enable you to automatically fetch metadata from a website. But then you still have a citation database somewhere that you may or may not be able to synchronize with different computers, but probably won't be able to add to the version control of your paper. There's hacks such as better-bibtex to automatically generate and update diffable bibtex files -- But that means you now have two sources of truth, and since the export is one-way this leads to multiple contributors overriding each other's changes. pandoc-url2cite goes a step further: URLs are directly used as the cite keys, and the "bibliography file" is just an auto-generated intermediary artifact of those URLs.

pandoc-url2cite is based on the work of the Zotero developers. Zotero has a set of "Translators" that are able to extract citation info from a number of specific and general web pages. These translators are written in Javascript and run within the context of the given web site. They are made to be used from the Zotero Connector browser extension, but thankfully there is a standalone Translation Server as well. To avoid the effort required to automatically start and manage this server locally, pandoc-url2cite instead uses a publicly accessible instance of this server provided by Wikipedia with a public REST API.

All citation data is cached (permanently) as bibtex as well as CSL to citation-cache.json. This is both to improve performance and to make sure references stay the same forever after the initial fetch, as well as to avoid problems if the API might be down in the future. This also means that errors in the citation data can be fixed manually, although if you find you need to do a lot of manual tweaking you might again be better off with Zotero.

Configuration / Special Cases

Advanced Configuration

You can see a list of all supported config options in config.d.ts.

Mixing manual references and generated URL-based ones

Right now there's four ways you can use url2cite in combination with "manual" citations:

  1. Prefix the cite key with raw:. e.g. [@raw:foobar]. These are ignored by url2cite, and you can add the reference however you want in your --bibliography= file.
  2. set url2cite-allow-dangling-citations=true. That suppresses the Could not find URL for @foobar. error and makes url2cite just ignore any cite keys that aren't aliased to an url.
  3. Use an URL as a cite-key. Doesn't need to have a DOI or be a paper, just any website that's relevant to the work is fine as long as Zotero understands it. Then manually adjust the CSL entry url2cite generates in citation-cache.json.
  4. Use an URL as a cite-key like in (3), but directly add the bibtex in a code block with language url2cite-bibtex anywhere:
    see also @{https://github.com/DLR-RM/stable-baselines3}.
    
    ```url2cite-bibtex
    @misc{https://github.com/DLR-RM/stable-baselines3,
    author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
    title = {Stable Baselines3},
    year = {2019},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/DLR-RM/stable-baselines3}},
    }
    ```

Using other kinds of unique IDs

pandoc-url2cite also supports ISBNs and DOIs:

The book [@isbn:978-0374533557, pp. 15-17] is interesting.

See this example.

Using without citeproc (with natbib/biblatex)

If you don't want to use citeproc, you can set url2cite-output-bib=foo.bib to make url2cite output a bibtex file for consumption by your preferred LaTeX tool.

Limitations

  1. Currently, extracting the metadata from direct URLs of full text PDFs does not work, so you will need to use the URL of an overview / abstract page etc. I'm not sure why, since this does work in Zotero. More info might be here
  2. Some websites just have wrong meta information. For example, citationstyles.org has set "Your Name" as the website author in their Open Graph metadata. You can manually modify the citation-cache.json file to fix / change anything.

Related Projects

  • Manubot is a more integrated and opinionated tool for creating scientific documents that has a similar method for creating citations without the hassle.
  • pandoc-url2cite-hs is a Haskell port of this tool (mostly compatible)

Longer Example

AlexNet first introduced CNNs to the ImageNet challenge. [@vgg; @googlenet; @resnet] further improved on the results.

References

More Repositories

1

ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Rust
8,147
star
2

sql.js-httpvfs

Hosting read-only SQLite databases on static file hosters like Github Pages
TypeScript
3,469
star
3

sqlite-zstd

Transparent dictionary-based row-level compression for SQLite
Rust
1,447
star
4

blog

Source code of my personal blog
TypeScript
339
star
5

world-development-indicators-sqlite

Python
152
star
6

procedural-cities

Information about procedural city generation
TeX
128
star
7

nmap-log-parse

Logs which devices are in your local network and draws graphs
TypeScript
127
star
8

youtube-sponsorship-stats

TypeScript
121
star
9

timetrackrs

An automated time tracker (WIP)
Rust
113
star
10

tv-show-ratings

Compare the episode ratings of TV shows
TypeScript
75
star
11

neural-network-demo

Demonstration and visualization of feed-forward neural networks running in the browser
TypeScript
60
star
12

tuxguitar

unofficial mirror from sourceforge svn
Java
57
star
13

backchannel-prediction

Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Python
54
star
14

tantivy-wasm

TypeScript
42
star
15

fbstats

generate facebook messaging statistics
TypeScript
38
star
16

levenshtein-demo

TypeScript
35
star
17

typed-socket.io

A library for fully typed client-server communication with socket.io and TypeScript.
TypeScript
30
star
18

encrypted-gist

Storing files in github gists, with client side transparent encryption and authentication
TypeScript
28
star
19

webrtc-remote-touch-pen-input

JavaScript
24
star
20

ts-histdbimport

Imports a `.zsh_history` file into a https://github.com/larkery/zsh-histdb sqlite database.
JavaScript
24
star
21

convolution-demo

Visualization of convolution on pairs of simple functions
TypeScript
18
star
22

thought-forge-ai

Generate 30-60 second "deep thought" TikTok-style video including a monologue, moving video scenes, music, and subtitles.
TypeScript
16
star
23

ebv-theta-to-mqtt

Perl
15
star
24

emojidome

Interactive viewer of the results of the Emojidome XKCD
TypeScript
14
star
25

dupegone

small fast duplicate file finder in c++
C++
11
star
26

redis-remotify

A tiny TypeScript library for fully typed remote calls via Redis.
TypeScript
9
star
27

mima

Mima Compiler and Interpreter for the browser
JavaScript
9
star
28

Gelddruckmaschine

Telegram bot that finds arbitrage opportunities between multiple crypto trading sites to print money mostly risk-free
JavaScript
8
star
29

warc-sqlite

POC of converting a set of WARC web archive files to a SQLite database and querying it
Python
7
star
30

phiresky.github.io

Lazy Github homepage generator
TypeScript
7
star
31

deep-intellisense

IntelliSense based on deep learning using char-rnn
TypeScript
6
star
32

endoh1-ts

Deobfuscating the ASCII fluid simulater and converting it to TypeScript
JavaScript
6
star
33

lemmy-federation-state

Quick visualization of the lemmy federation state
TypeScript
6
star
34

youtube-watch-history-parse

TypeScript
5
star
35

rpi-autousbupload

Automatically uploads photos from usb devices to an ftp server. Optimized for robustness on an Raspberry Pi.
Python
5
star
36

dfa2regex

Converts Deterministic finite automata to regular expressions.
JavaScript
5
star
37

minecraft-stats

Parses Vanilla Minecraft Server Statistics and displays graphs
JavaScript
4
star
38

fix-messy-movie-folder

Try to identify movie files in messy folders and sort them into a predefined structure
Java
4
star
39

rust-brotli-wasm

Two experiments of the brotli encoder / decoder compiled from Rust to WebAssembly.
Rust
4
star
40

RobinHood-TheLegendOfSherwood-Resolution-Patcher

Patch the resolution of Robin Hood: The Legend of Sherwood to support any resolution (e.g. 1920x1080)
C++
4
star
41

socket.io-distributor

Simple load leveling for socket.io
TypeScript
3
star
42

portfolio-bot

Telegram Bot that allows you to get information on your investments (stocks, ETFs, etc)
TypeScript
3
star
43

coronavirus-reproduction-analysis

Jupyter Notebook
3
star
44

qalc-react

Unit Calculator
TypeScript
3
star
45

gaussian-mixtures-demo

TypeScript
2
star
46

pythoven

Python
2
star
47

joint-multilingual-speech-recognition-and-language-id

TeX
2
star
48

tree-magic-cli

Rust
2
star
49

how-long-am-i-working

use google location history to graph how much time you've spent at a location (e.g. at work)
TypeScript
2
star
50

pingplot

plot a ping log
R
2
star
51

csv-cooccurrence-graph

Rust
2
star
52

RaspberrySmartScaleReceiver

Program to receive the body weight and body composition analysis data from a Soehnle 63760 BB smart scale using a 433 Mhz receiver on the Raspberry Pi.
C++
2
star
53

phiresky

1
star
54

rga-windows-test

Rust
1
star
55

bachelor-thesis

TeX
1
star
56

bayesian-aggregation-for-swarm-reinforcement-learning

Python
1
star
57

score-voting-tool

Rust
1
star
58

masters-thesis

HTML
1
star
59

algo2-summary

compact summary of stuff from the algorithms 2 lecture. Zusammenfassung der Algorithmen II Vorlesung am KIT
TeX
1
star
60

ocr-pdf-via-document-ai

OCR a set of images via the Google Cloud API
JavaScript
1
star
61

plangraph-impl

Implementation of some algorithms for planar graphs
JavaScript
1
star
62

guitar-tabs

Static hosted version of the guitar tabs portion of my old (<2010) website
CSS
1
star
63

plangraph

Mitschrieb zur Vorlesung planare Graphen KIT SS2015
Python
1
star
64

kogsys-demos

overview page for all the lecture demonstrations
HTML
1
star
65

ts-boilerplates

just some simple boilerplates for me to use
TypeScript
1
star
66

nushell-history-skim

Rust
1
star
67

prosem-proto

Simple procedural town generation (for Proseminar)
TypeScript
1
star
68

fourier-series-demo

TypeScript
1
star