• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extract bibliographic references from (High-Energy Physics) articles.

refextract

About

A small library for extracting references used in scholarly communication.

Install

$ pip install refextract

Usage

To get structured information from a publication reference:

>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': '',
}

To extract references from a PDF:

>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

To extract directly from a URL:

>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

Notes

refextract depends on pdftotext.

Acknowledgments

refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:

  • Alessio Deiana
  • Federico Poli
  • Gerrit Rindermann
  • Graham R. Armstrong
  • Grzegorz Szpura
  • Jan Aage Lavik
  • Javier Martin Montull
  • Micha Moskovic
  • Samuele Kaplun
  • Thorsten Schwander
  • Tibor Simko

License

GPLv2

More Repositories

1

magpie

Deep neural network framework for multi-label text classification
Python
683
star
2

beard

Bibliographic Entity Automatic Recognition and Disambiguation
Python
66
star
3

inspire-next

The INSPIRE repo.
Python
59
star
4

rest-api-doc

Documentation of the INSPIRE RESTΒ API
40
star
5

hepcrawl

Scrapy project for feeds into INSPIRE-HEP
Python
17
star
6

inspire

Official repo of the legacy INSPIRE-HEP overlay
Python
17
star
7

impact-graphs

Creates graphs to show a publication's impact, and the impact of cited publications, and papers who've cited a publication of interest.
JavaScript
16
star
8

inspirehep

Documentation: http://inspire.docs.cern.ch
Python
13
star
9

inspire-schemas

Inspire JSON schemas and utilities to use them.
Python
8
star
10

jsonschema2rst

Python
7
star
11

record-editor

Record editing tool used in http://inspirehep.net
TypeScript
6
star
12

inspire-query-parser

A PEG-based query parser for INSPIRE.
Python
5
star
13

author.xml

Documentation of the author.xml format to describe author lists
XSLT
5
star
14

invenio-grobid

Invenio package for integration of the Grobid metadata extraction service
Python
4
star
15

inspire-classifier

INSPIRE text classification microservice
Python
4
star
16

inspire-crawler

Crawler integration with INSPIRE-HEP.
Python
4
star
17

inspire-docker

Dockerfiles for inspirehep/inspire-next application
Shell
4
star
18

invenio-matcher-benchmark

Test data for invenio-matcher
Python
4
star
19

inspire-json-merger

INSPIRE-specific configuration of the JSON Merger.
Python
3
star
20

inspire-dojson

INSPIRE-specific rules to transform from MARCXML to JSON and back.
Python
3
star
21

inspire-matcher

Find the records in INSPIRE most similar to a given record or reference.
Python
3
star
22

plotextractor

Extract images and captions from TeX files in a tar archive.
Python
3
star
23

inspirehep-ui

UI for INSPIREHEP
JavaScript
2
star
24

curation-scripts

Scripts for automated large-scale curation
Python
2
star
25

inspire-utils

INSPIRE-specific utils.
Python
2
star
26

inspirehep-search-js

Angular JS application used in search results page
JavaScript
2
star
27

inspire-citesummary-js

INSPIRE HEP Citation Summary JS code
JavaScript
2
star
28

beard-server

Application providing REST API over Beard
Python
2
star
29

inspire-relations

Invenio module to integrate Neo4J graph database into INSPIRE and handle relations across records.
Python
1
star
30

python-rt

Temporary clone of https://gitlab.labs.nic.cz/labs/python-rt
Python
1
star
31

es-cli

Small cli tool to play with Elasticsearch indices (dump, load, reindex...)
Python
1
star
32

isbnid

Python ISBN identifier library
Python
1
star
33

images

Docker images for the inspire project
Python
1
star
34

inspire-mitmproxy

Python
1
star
35

inspire-magpie

Wrapper around magpie for InspireHEP
Python
1
star
36

relations

Neo4J-based module to handle INSPIRE specific relations
Python
1
star
37

invenio-trends

Trends Dashboard API for Invenio Installations
Jupyter Notebook
1
star