The ContentMine (@ContentMine)

Top repositories

1

quickscrape

A scraping command line tool for the modern web
JavaScript
259
star
2

getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
JavaScript
197
star
3

journal-scrapers

Journal scraper definitions for the ContentMine framework
Ruby
66
star
4

workshop-resources

This repository contains material helping you to set up a ContentMine workshop. It also includes tutorials for learning the ContentMine tools on your own.
37
star
5

norma

Convert XML/SVG/PDF into normalised, sectioned, scholarly HTML
HTML
36
star
6

scraperJSON

The scraperJSON standard for defining web scrapers as JSON objects
33
star
7

thresher

Headless scraperJSON scraping for Node.js
JavaScript
27
star
8

ami

HTML
13
star
9

FutureTDM

Materials of FutureTDM project
Jupyter Notebook
11
star
10

cm-crawlerd

ContentMine crawler daemon - this finds the latest articles in journals we mine, and stores them in our scraping queue
JavaScript
6
star
11

contentmine-app

The ContentMine ecosystem as a standalone app for OSX, Windows and Linux.
JavaScript
6
star
12

wikifactmine-api

The WikifactMine API Endpoint
JavaScript
5
star
13

cmbot

An autonomous bot for scraping the academic literature
JavaScript
5
star
14

canary

Canary is a UI to the contentmine tools getpapers, quickscrape, norma, and ami.
HTML
5
star
15

NCBI2wikidata

Go
5
star
16

old_site

The contentmine site, which (currently) includes the API
HTML
4
star
17

contentmine.github.io

ContentMine installation instructions website
HTML
4
star
18

canary-perch

ES Academic paper fact extraction - backend for canary
JavaScript
4
star
19

visualizations

Python
3
star
20

neuro

Neurophysiology, especially voltage traces
3
star
21

pyCProject

Provides basic function to read a ContentMine CProject and CTrees into python datastructures.
Python
3
star
22

vms

ContentMine virtual machines
3
star
23

node-journalTOCs

Node.js client for the JournalTOCs API
JavaScript
2
star
24

ebi_workshop_20141006

ContentMine workshop at EBI, October 6th 2014
HTML
2
star
25

scripts

Shell and Python scripts for utility activities
HTML
2
star
26

cm-ucl

A repository to openly track progress on table extraction.
HTML
2
star
27

releases

Release packages for ContentMine projects
Shell
2
star
28

wikibase

Simple golang library for interfacing with wikibase.
Go
2
star
29

workshops

General materials for workshops
2
star
30

Chicago-20141114

ContentMine workshop in Chicago (US), November 14th 2014
2
star
31

nhtml

NHTML is a normalization of scholarly documents from {PDF, HTML, XML, SVG, PNG} into a single semantic format
Java
2
star
32

ScienceSourceReview

Go
1
star
33

CMServices

Web services layer for ContentMine text and data mining tools and utilities
JavaScript
1
star
34

dictionaries

Dictionaries for use with `ami` , including some management software
HTML
1
star
35

vt-open-data-week

Virginia Tech workshop
Jupyter Notebook
1
star
36

pdf2svg

ContentMine Fork of the WWMM pdf2svg Package
Java
1
star
37

contentmine.org

The static site
HTML
1
star
38

imageanalysis

ContentMine Fork of the WWMM imageanalysis Package
HTML
1
star
39

pyCMine

Python scripts for downstream analyses of content mine extracted facts, mostly comming from pyCProject
1
star
40

cephis

Document processing including support libraries and PDFBox2
1
star
41

cm-uclii

Data and progress tracking for table extraction and semantically guided content enhancement
HTML
1
star
42

tilburg

Extraction of data from Vector-based Funnel Plots in the scholarly literature
Shell
1
star
43

JISC-Workshop-1Dec2014

Workshop resources for one day workshop at JISC on 1 Dec 2014
1
star
44

2015-11-07-mozfest15

Python
1
star
45

ijsem

Computational results of PLUTo ami-phylo analysis of trees from Int. J. Syst. Evol. Microbiol.
HTML
1
star
46

amidemos

HTML
1
star
47

contentmine-gui

GUI for executing ContentMine commands - browser SPA for running locally on user's machine.
JavaScript
1
star