• Stars
    star
    180
  • Rank 213,097 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created almost 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ultimate Website Sitemap Parser
Build Status Documentation Status Coverage Status PyPI package Download stats

Website sitemap parser for Python 3.5+.

Features

Installation

pip install ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses.

If you'd like to just list all the pages found in all of the sitemaps within the website, consider using all_pages() method:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

all_pages() method will return an iterator yielding SitemapPage objects; see a reference of SitemapPage.

More Repositories

1

gate-core

The GATE Embedded core API and GATE Developer application
Java
78
star
2

broad_twitter_corpus

The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)
Jupyter Notebook
65
star
3

python-gatenlp

Python text processing, pattern matching, and NLP framework
Jupyter Notebook
63
star
4

gateplugin-LearningFramework

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNNs through Pytorch and Keras.
Java
26
star
5

semeval2019-hyperpartisan-bertha-von-suttner

SemEval 2019 Hyperpartisan News Detection - team Bertha von Suttner contribution
Python
22
star
6

gateplugin-Python

Python integration for the GATE framework
Java
20
star
7

Bio-YODIE

Bio-YODIE is GATE's biomedical named entity linking pipeline.
Java
17
star
8

mimir

Multi-paradigm Information Management Index and Repository
Java
10
star
9

cluster-embeddings

Simple script to create clusters from embeddings in word2vec format
Python
10
star
10

CANTM

Python
8
star
11

jaspell

Fork of http://jaspell.sourceforge.net to allow control over the character encoding used for the dictionary files.
Java
6
star
12

gateplugin-Stanford_CoreNLP

GATE wrappers for the Stanford CoreNLP tool set
Java
5
star
13

StanceClassifier

Stance Classifier for the WeVerify project
Python
5
star
14

gate-teamware

A web application for collaborative document annotation.
Python
4
star
15

gate-lf-python-data

Python library for handling (dense) training/application data produced by the Learning Framework
Python
4
star
16

gateapplication-French

Processing pipeline for French, performing Tokenisation, POS Tagging and NER
Shell
3
star
17

emina

Emergent Informativeness and Actionability
Python
3
star
18

gcp

GATE Cloud Paralleliser
Java
3
star
19

wpextract

Create datasets from WordPress sites for research or archiving
Python
3
star
20

gateapplication-German

Processing pipeline for German, performing Tokenisation, POS Tagging and NER
Shell
3
star
21

gate-cloud-python-example

example of using the GATE Cloud on-line API
Python
3
star
22

gateplugin-dict-lemmatizer

A plugin for the GATE language technology framework for finding lemmata of words.
Java
3
star
23

gateplugin-Tagger_SyntaxNet

A GATE plugin for using a Google Tensorflow Serving SyntaxNet server
Java
2
star
24

gateplugin-JdbcLookup

A plugin for the GATE language technology framework for adding and updating annotations from a JDBC table.
Java
2
star
25

Tweet-Network-GEXF-Generator

Tweet Network GEXF Generator
Groovy
2
star
26

gateplugin-Lang_German

German language support for GATE
HTML
2
star
27

corpusconversion-bnc

Tool to convert the British National Corpus to GATE format
Java
2
star
28

dont-waste-single-annotation

2
star
29

gateplugin-Lang_Chinese

Support for processing Chinese documents
Java
2
star
30

gateplugin-MetaMapLite

A GATE plugin wrapping MetaMapLite.
Java
2
star
31

VaxxHesitancy

2
star
32

gateplugin-Tools

A selection of processing resources commonly used to extend ANNIE
Java
2
star
33

bio-yodie-resource-prep

Scripts to prepare the informational resources required by GATE Bio-YODIE.
Scala
2
star
34

gateplugin-Tagger_GoogleNLP

GATE NLP plugin for the Google NLP
Java
2
star
35

gateplugin-ModularPipelines

A plugin for the GATE language technology framework that helps creating modular pipelines and parametrizing them
Java
2
star
36

SurveyKeywordsExtraction

Keywords extraction from survey questions
Python
2
star
37

gatelib-spring

Spring support for use with GATE
Java
2
star
38

gateplugin-JAPE_Plus

An alternative, usually more efficient and faster, JAPE implementation
Java
2
star
39

gateplugin-Gazetteer_Ontology_Based

An ontology based gazetteer for GATE
Java
2
star
40

tweet-rehydrater

Tool to take standoff annotations against a list of Tweets and merge them with the original text from Twitter
Java
2
star
41

CLEF2024_InCrediblAE_Manual_Evaluation_Dataset

Manual evaluation dataset of CheckThat! Lab at CLEF 2024 Task 6: Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE)
2
star
42

gateplugin-Alignment

Java
1
star
43

gate-lf-keras-json

Keras wrapper for the LearningFramework GATE plugin
Python
1
star
44

gateplugin-LIWC

A gate plugin to extract LIWC features
Java
1
star
45

gateplugin-Crowd_Sourcing

GATE plugin to interface with the CrowdFlower crowd sourcing platform
Java
1
star
46

gate-dsl

Write GATE applications in a Groovy DSL.
Groovy
1
star
47

gateplugin-Lang_Danish

Support for processing Danish documents
Java
1
star
48

userguide

The GATE user guide
TeX
1
star
49

gateplugin-Format_Twitter

Document Format plugin to support reading and writing Twitter style JSON files
Java
1
star
50

gateplugin-ANNIE

Java
1
star
51

gateplugin-Twitter

A suite of tools designed for processing Tweets
Java
1
star
52

gateplugin-Ontology_Tools

Java
1
star
53

gateplugin-Sentiment

Provides resources for Sentiment Analysis in GATE
Groovy
1
star
54

gateplugin-Ontology

Ontology support for GATE
Java
1
star
55

youtube-scraper

Scrape Youtube Data
Python
1
star
56

UNGA-search

Exploration webapp for the UN GA MΓ­mir index.
CSS
1
star
57

cloud-client

Client library for the GATE Cloud REST APIs
Java
1
star
58

cluster-brown4wikipedia

Tools to simplify creating brown clusters from Wikipedia dump files
Python
1
star
59

gateplugin-Groovy

Adds support for the Groovy scripting language to GATE as well as making GATE easier to use from Groovy scripts
Java
1
star
60

gate-lf-pytorch-json

PyTorch wrapper for the LearningFramework GATE plugin
Python
1
star
61

gateplugin-Java

A plugin for the GATE language technology framework that allows on-the fly use of Java programs as Processing Resources
Java
1
star
62

gateplugin-UNGA

Information extraction for United Nations General Assembly Resolutions
Python
1
star
63

corpusconversion-conll2003

Tool/scripts to help converting the CoNLL 2003 corpora to GATE format
Scala
1
star
64

gateplugin-Tagger_TagMe

GATE NLP plugin for the TagMe service
Java
1
star
65

gateplugin-DocumentNormalizer

Tools for normalizing documents before processing
Java
1
star
66

sklearn-wrapper

A lightweight wrapper around scikit-learn for the GATE LearningFramework plugin
Python
1
star
67

weka-wrapper

A very lightweight wrapper around Weka
Java
1
star
68

TopicLLM_Granularity_Hallucination

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Jupyter Notebook
1
star
69

gateplugin-Format_DataSift

Document Format plugin to support reading DataSift JSON files
Java
1
star
70

gateplugin-StringAnnotation

A plugin for the GATE language technology framework that provides gazetteer and regular expression annotator PRs for string annotation
Java
1
star
71

gateplugin-CISTEM

A GATE wrapper around the CISTEM German Stemmer (see https://github.com/LeonieWeissweiler/CISTEM)
Java
1
star
72

gate-lf-keras-sparse

A lightweight wrapper around keras mainly for use with the GATE LearningFramework plugin
Python
1
star