• Stars
    star
    495
  • Rank 88,974 (Top 2 %)
  • Language
    Go
  • License
    MIT License
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Golang HTML to plaintext conversion library

html2text

Documentation Build Status Report Card

Converts HTML into text of the markdown-flavored variety

Introduction

Ensure your emails are readable by all!

Turns HTML into raw text, useful for sending fancy HTML emails with an equivalently nicely formatted TXT document as a fallback (e.g. for people who don't allow HTML emails or have other display issues).

html2text is a simple golang package for rendering HTML into plaintext.

There are still lots of improvements to be had, but FWIW this has worked fine for my [basic] HTML-2-text needs.

It requires go 1.x or newer ;)

Download the package

go get jaytaylor.com/html2text

Example usage

Library

package main

import (
	"fmt"

	"jaytaylor.com/html2text"
)

func main() {
	inputHTML := `
<html>
  <head>
    <title>My Mega Service</title>
    <link rel=\"stylesheet\" href=\"main.css\">
    <style type=\"text/css\">body { color: #fff; }</style>
  </head>

  <body>
    <div class="logo">
      <a href="http://jaytaylor.com/"><img src="/logo-image.jpg" alt="Mega Service"/></a>
    </div>

    <h1>Welcome to your new account on my service!</h1>

    <p>
      Here is some more information:

      <ul>
        <li>Link 1: <a href="https://example.com">Example.com</a></li>
        <li>Link 2: <a href="https://example2.com">Example2.com</a></li>
        <li>Something else</li>
      </ul>
    </p>

    <table>
      <thead>
        <tr><th>Header 1</th><th>Header 2</th></tr>
      </thead>
      <tfoot>
        <tr><td>Footer 1</td><td>Footer 2</td></tr>
      </tfoot>
      <tbody>
        <tr><td>Row 1 Col 1</td><td>Row 1 Col 2</td></tr>
        <tr><td>Row 2 Col 1</td><td>Row 2 Col 2</td></tr>
      </tbody>
    </table>
  </body>
</html>`

	text, err := html2text.FromString(inputHTML, html2text.Options{PrettyTables: true})
	if err != nil {
		panic(err)
	}
	fmt.Println(text)
}

Output:

Mega Service ( http://jaytaylor.com/ )

******************************************
Welcome to your new account on my service!
******************************************

Here is some more information:

* Link 1: Example.com ( https://example.com )
* Link 2: Example2.com ( https://example2.com )
* Something else

+-------------+-------------+
|  HEADER 1   |  HEADER 2   |
+-------------+-------------+
| Row 1 Col 1 | Row 1 Col 2 |
| Row 2 Col 1 | Row 2 Col 2 |
+-------------+-------------+
|  FOOTER 1   |  FOOTER 2   |
+-------------+-------------+

Command line

echo '<div>hi</div>' | html2text

Unit-tests

Running the unit-tests is straightforward and standard:

go test

License

Permissive MIT license.

Contact

You are more than welcome to open issues and send pull requests if you find a bug or want a new feature.

If you appreciate this library please feel free to drop me a line and tell me! It's always nice to hear from people who have benefitted from my work.

Email: jay at (my github username).com

Twitter: @jtaylor

Alternatives

https://github.com/k3a/html2text - Lightweight

More Repositories

1

shipbuilder

The Open-source self-hosted Platform-as-a-Service written in Go
Go
410
star
2

sql-layer

Java
91
star
3

ansible-kafka

Ansible Kafka role
65
star
4

go-hostsfile

The /etc/hosts parsing and resolver library for golang
Go
58
star
5

mockery-example

Advanced Example for mockery
Go
36
star
6

go-find

A programmatically accessible golang implementation of the *nix `find` command.
Go
31
star
7

python-inlineify-html

Python utility to convert an HTML page into a single page, which still has all images.
Python
21
star
8

tesseract-web

Tesseract Image OCR Web API Service written in Go
Go
17
star
9

archive.is

Golang package for archiving web pages via archive.is.
Go
11
star
10

evernote-publisher

Evernote collection system which organizes notes into a flexible format which allows for easy publishing.
Python
10
star
11

play-gzip

Support for HTTP GZIP compression with play-framework 1.2.x applications
Scala
8
star
12

jaws

Java API for WordNet Searching (JAWS) http://lyle.smu.edu/~tspell/jaws/
Java
7
star
13

hn-utils

Go
7
star
14

bboltqueue

bboltqueue: A Queue Structure for boltdb
Go
6
star
15

archive.org

Golang package for archived webpage search via archive.org. https://jaytaylor.com/archive.org
Go
5
star
16

python-twitter-api

Python Twitter API
Python
5
star
17

txt-web

Golang server which converts any URL into plaintext
Go
4
star
18

hacker-news-archive

HackerNews SnapShot Archival System
Python
3
star
19

apd

Patched version of Advanced PHP Debugger, aka APD
C
2
star
20

logserver

Go Logger
Go
2
star
21

persistit

Akiban Persistit is fast, transactional, Java B+Tree library available as open source or under a free use license.
Java
2
star
22

BetterElasticSearch

ElasticSearch Web-UI improvements
2
star
23

goose-cli

Command-line interface for the venerable GoOse
Go
2
star
24

vim-xml-util

Bundle of Vim plugins providing tag matching and auto-closing for XML/HTML
Vim Script
2
star
25

jaytaylor-mvn-repo

Jay Taylor's Public Maven/Maven2 Repo
1
star
26

cron2sysdtimer

Convert Crontab Jobs into SystemD Timers
Go
1
star
27

pyurl

Pyurl allows you to run your own URL shortening service, just like TinyURL, bit.ly, is.gd, etc..
Python
1
star
28

libchanner

Plug into the power of libchan
Go
1
star
29

andromeda

Go
1
star
30

circus

Go
1
star
31

gregarius-ng

This is the software which powers the security news aggregator at sec.jetlib.com
1
star
32

shipbuilder-site

CSS
1
star
33

smudged

Just a test
1
star
34

streamon

Exec command output stream consumer parser written in Go.
Go
1
star
35

uuid

The Go UUID library, an actively maintained fork of nu7hatch/gouuid
1
star
36

body-scan-analysis

JavaScript
1
star
37

django-pg-current-timestamp

Add true postgresql `CURRENT_TIMESTAMP` support to Django + PostgreSQL.
Python
1
star
38

TweetFarmer

Tweet Archival Utility
1
star
39

jaytaylor.github.com

CSS
1
star
40

swagger-play

A module for play-framework to expose swagger compatible APIs from play
Scala
1
star
41

polygot_quick_reference_skeletons

Polyglot quick references of programming best-practices starting places.
Shell
1
star