• Stars
    star
    1,678
  • Rank 26,786 (Top 0.6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 10 years ago
  • Updated 24 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Convert HTML to Markdown-formatted text.

html2text

CI codecov

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [filename [encoding]]

Option Description
--version Show program's version number and exit
-h, --help Show this help message and exit
--ignore-links Don't include any formatting for links
--escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]...[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.org/project/html2text/

$ pip install html2text

How to run unit tests

tox

To see the coverage results:

coverage html

then open the ./htmlcov/index.html file in your browser.

Documentation

Documentation lives here

More Repositories

1

stop-words

List of common stop words in various languages.
309
star
2

python-short_url

Python implementation for generating Tiny URL- and bit.ly-like URLs.
Python
178
star
3

python-stop-words

Get list of common stop words in various languages in Python
Python
155
star
4

python-currencies

Display money format and its filthy currencies, for all money lovers out there.
Python
72
star
5

django-crequest

django-crequest - Taking care of current request in silent way.
Python
60
star
6

oss-wall-of-shame

Companies that use open source and never bother to contribute back - Open Source Software Wall of Shame
44
star
7

django-databrowse

Databrowse is a Django application that lets you browse your data.
Python
42
star
8

django-markwhat

A collection of template filters that implement common markup languages.
Python
20
star
9

ansible-suricata

An Ansible playbook for deploying the Suricata intrusion detection system and fetching Snort rules with Oinkmaster.
Jinja
14
star
10

yoDownet

yoDownet, The previous generation graphical download manager, built on Qt.
C++
10
star
11

django-base64field

A motherfucking django model field to bring base64 encoded key to models.
Python
8
star
12

cmsplugin-simple-markdown

A plugin for django-cms that provides just a markdown plugin and nothing more.
Python
6
star
13

django-mongodb-cash-backend

django-mongodb-cash-backend
Python
6
star
14

python-gignore

Get .gitignore files from github.com/github/gitignore
Python
4
star
15

flask-microblog-sqlalchemy

based on Miguel Grinberg flask tutorial at http://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
Python
3
star
16

negar-cli

Negar Command Line Interface
Python
3
star
17

php-solusvm

SolusVM API PHP LIbrary
PHP
3
star
18

django-kewl

Django Kewl - Set of Django kewl utilities & helpers & highly used/needed stuff.
Python
3
star
19

nevis

Nevis is only a simple fast text editor, implemented in Qt/C++. -- Currently under development.
C++
2
star
20

python-cpanel

Python cPanel - The snake ate cPanel API.
Python
2
star
21

django-blog-book

Developing A Blog Application In Django Python Web Framework.
2
star
22

markwhat

Markwhat is a desktop cross-platform markup text editor with live preview feature. Written in pure python.
Python
2
star
23

node-money-currencies

Display money format and its filthy currencies, for all money lovers out there.
JavaScript
2
star
24

markwhat-online

Is an online tool/api to parse markup data. It's available through a web interface and a delicious API which speaks JSON.
Python
1
star
25

.dotfiles

@Alir3z4's ~/.dotfiles all over the place!
Shell
1
star
26

aur-pkgs

Alireza's ArchLinux User Repository PKGBUILDs, made with absolute rage!!!
Shell
1
star
27

WHMCS

PHP
1
star
28

python-simplerelevance

SimpleRelevance API Python Wrapper
Python
1
star