Awesome Python Web Content Extracting

  • lassie lassie 588
    star
    updated over 1 year ago MIT License

    Web Content Retrieval for Humansâ„¢

  • updated 4 months ago MIT License

    a small library for extracting rich content from urls

  • newspaper newspaper 13,723
    star
    updated 26 days ago MIT License

    newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

  • updated 12 days ago MIT License

    Pythonic HTML Parsing for Humansâ„¢

  • sumy sumy 3,394
    star
    updated 2 months ago Apache License 2.0

    Module for automatic summarization of text documents and HTML pages.

  • textract textract 3,730
    star
    updated about 2 months ago MIT License

    extract text from any document. no muss. no fuss.