• Stars
    star
    135
  • Rank 260,881 (Top 6 %)
  • Language
    PHP
  • License
    MIT License
  • Created 7 months ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A PHP HTML to pure text transformer.

Hypertext

A PHP HTML to pure text transformer that beautifully handles various and malformed HTML.


Hypertext is excellent at pulling text content out of any HTML based document and automatically:

  • Removes CSS
  • Removes scripts
  • Removes headers
  • Removes non-HTML based content
  • Preserves spacing
  • Preserves links (optional)
  • Preserves new lines (optional)

It is directed at using the output in LLM related tasks, such as prompts and embeddings.

Installation

composer require stevebauman/hypertext

Usage

use Stevebauman\Hypertext\Transformer;

$transformer = new Transformer();

// (Optional) Retain new line characters.
$transformer->keepNewLines();

// (Optional) Retain anchor tags and their href attribute.
$transformer->keepLinks();

$text = $transformer->toText($html);

Example

For larger examples, please view the tests/Fixtures directory.

Input:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My Blog</title>
</head>
<body>
    <h1>Welcome to My Blog</h1>
    <p>This is a paragraph of text on my webpage.</p>
    <a href="https://blog.com/posts">Click here</a> to view my posts.
</body>
</html>

Output (Pure Text):

echo (new Transformer)->toText($html);
Welcome to My Blog This is a paragraph of text on my webpage. Click here to view my posts.

Output (Keep New Lines):

echo (new Transformer)->keepNewLines()->toText($html);
Welcome to My Blog
This is a paragraph of text on my webpage.
Click here to view my posts.

Output (Keep Links):

echo (new Transformer)->keepLinks()->toText($html);
Welcome to My Blog This is a paragraph of text on my webpage. <a href="https://blog.com/posts">Click Here</a> to view my posts.

Output (Keep Both):

echo (new Transformer)
    ->keepLinks()
    ->keepNewLines()
    ->toText($html);
Welcome to My Blog
This is a paragraph of text on my webpage.
<a href="https://blog.com/posts">Click Here</a> to view my posts.

More Repositories

1

location

Detect a users location by their IP Address.
PHP
1,024
star
2

purify

A Laravel wrapper for HTMLPurifier by ezyang
PHP
412
star
3

showcode

Create beautiful images of code.
Vue
391
star
4

unfinalize

Remove "final" keywords from classes and methods in vendor packages.
PHP
129
star
5

curlwind

Generate Tailwind utility stylesheets on demand.
CSS
115
star
6

laravel-husk

A thin and light scaffolded Laravel Dusk environment.
PHP
89
star
7

autodoc-facades

Auto-generate PHP doc annotations for Laravel facades
PHP
84
star
8

translation

An easy database driven automatic translator for Laravel 5
PHP
75
star
9

eloquent-table

An HTML table generator for laravel collections
PHP
46
star
10

github-summarizer

A PHP GitHub summarizer using Chat GPT.
PHP
43
star
11

revision

Revisions for Eloquent Models
PHP
39
star
12

maintenance

A Preventative Maintenance Application (CMMS) for Laravel
PHP
37
star
13

log-reader

An easy log reader for Laravel 5
PHP
23
star
14

wmi

A package for WMI manipulation using PHP and COM.
PHP
10
star
15

calendar-helper

A Laravel calendar implementation with Google Calendar
PHP
8
star
16

helpdesk

An IT Helpdesk for managing issues and other related information.
PHP
6
star
17

maintenance-app

The Maintenance Application
JavaScript
6
star
18

pdf

A Dompdf Wrapper for Laravel.
PHP
5
star
19

platform-logs

A cartalyst's platform log manager
PHP
4
star
20

profilepicture-cli

Download all of your 4K images from https://ProfilePicture.ai automatically
PHP
4
star
21

platform-localization

A localization manager for cartalyst's platform
PHP
3
star
22

laravel-husk-gridsome

PHP
3
star
23

laravel-husk-nuxt

A Laravel Husk example using Nuxrt
PHP
3
star
24

Corp

An AdLDAP Helper Package for Larvel 4/5
PHP
2
star
25

shiki

A beautiful Syntax Highlighter.
TypeScript
2
star
26

administration

An administration backend scaffolding package for Laravel.
PHP
2
star
27

active

An active HTML class helper that echo's strings based on the current route.
PHP
2
star
28

flash

Sweet Alert flash notifications in Laravel.
PHP
1
star
29

viewer

A presenter-like package but used for attaching modular views on a retrieved eloquent record
PHP
1
star
30

WinSchedule

Actual PHP task scheduling in Windows using COM.
PHP
1
star
31

stevebauman-blog

A repository for hosting blog comments for stevebauman.ca
1
star
32

WinPerm

A Windows File / Folder Permission Parser in PHP.
PHP
1
star