• Stars
    star
    2,045
  • Rank 22,614 (Top 0.5 %)
  • Language
    PHP
  • License
    MIT License
  • Created about 12 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Get info from any web service or page

Embed

Latest Version on Packagist Total Downloads Monthly Downloads Software License

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc).

Requirements:

If you need PHP 5.5-7.3 support, use the 3.x version

Online demo

http://oscarotero.com/embed/demo

Video Tutorial

Installation

This package is installable and autoloadable via Composer as embed/embed.

$ composer require embed/embed

Usage

use Embed\Embed;

$embed = new Embed();

//Load any url:
$info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE');

//Get content info

$info->title; //The page title
$info->description; //The page description
$info->url; //The canonical url
$info->keywords; //The page keywords

$info->image; //The thumbnail or main image

$info->code->html; //The code to embed the image, video, etc
$info->code->width; //The exact width of the embed code (if exists)
$info->code->height; //The exact height of the embed code (if exists)
$info->code->ratio; //The aspect ratio (width/height)

$info->authorName; //The resource author
$info->authorUrl; //The author url

$info->cms; //The cms used
$info->language; //The language of the page
$info->languages; //The alternative languages

$info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc)
$info->providerUrl; //The provider url
$info->icon; //The big icon of the site
$info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px)

$info->publishedTime; //The published time of the resource
$info->license; //The license url of the resource
$info->feeds; //The RSS/Atom feeds

Parallel multiple requests

use Embed\Embed;

$embed = new Embed();

//Load multiple urls asynchronously:
$infos = $embed->getMulti(
    'https://www.youtube.com/watch?v=PP1xn5wHtxE',
    'https://twitter.com/carlosmeixidefl/status/1230894146220625933',
    'https://en.wikipedia.org/wiki/Tordoia',
);

foreach ($infos as $info) {
    echo $info->title;
}

Document

The document is the object that store the html code of the page. You can use it to extract extra info from the html code:

//Get the document object
$document = $info->getDocument();

$document->link('image_src'); //Returns the href of a <link>
$document->getDocument(); //Returns the DOMDocument instance
$html = (string) $document; //Returns the html code

$document->select('.//h1'); //Search

You can perform xpath queries in order to select specific elements. A search always return an instance of a Embed\QueryResult:

//Search the A elements
$result = $document->select('.//a');

//Filter the results
$result->filter(fn ($node) => $node->getAttribute('href'));

$id = $result->str('id'); //Return the id of the first result as string
$text = $result->str(); //Return the content of the first result

$ids = $result->strAll('id'); //Return an array with the ids of all results as string
$texts = $result->strAll(); //Return an array with the content of all results as string

$tabindex = $result->int('tabindex'); //Return the tabindex attribute of the first result as integer
$number = $result->int(); //Return the content of the first result as integer

$href = $result->url('href'); //Return the href attribute of the first result as url (converts relative urls to absolutes)
$url = $result->url(); //Return the content of the first result as url

$node = $result->node(); //Return the first node found (DOMElement)
$nodes = $result->nodes(); //Return all nodes found

Metas

For convenience, the object Metas stores the value of all <meta> elements located in the html, so you can get the values easier. The key of every meta is get from the name, property or itemprop attributes and the value is get from content.

//Get the Metas object
$metas = $info->getMetas();

$metas->all(); //Return all values
$metas->get('og:title'); //Return a key value
$metas->str('og:title'); //Return the value as string (remove html tags)
$metas->html('og:description'); //Return the value as html
$metas->int('og:video:width'); //Return the value as integer
$metas->url('og:url'); //Return the value as full url (converts relative urls to absolutes)

OEmbed

In addition to the html and metas, this library uses oEmbed endpoints to get additional data. You can get this data as following:

//Get the oEmbed object
$oembed = $info->getOEmbed();

$oembed->all(); //Return all raw data
$oembed->get('title'); //Return a key value
$oembed->str('title'); //Return the value as string (remove html tags)
$oembed->html('html'); //Return the value as html
$oembed->int('width'); //Return the value as integer
$oembed->url('url'); //Return the value as full url (converts relative urls to absolutes)

Additional oEmbed parameters (like instagrams hidecaption) can also be provided:

$embed = new Embed();

$result = $embed->get('https://www.instagram.com/p/B_C0wheCa4V/');
$result->setSettings([
    'oembed:query_parameters' => ['hidecaption' => true]
]);
$oembed = $info->getOEmbed();

LinkedData

Another API available by default, used to extract info using the JsonLD schema.

//Get the linkedData object
$ld = $info->getLinkedData();

$ld->all(); //Return all data
$ld->get('name'); //Return a key value
$ld->str('name'); //Return the value as string (remove html tags)
$ld->html('description'); //Return the value as html
$ld->int('width'); //Return the value as integer
$ld->url('url'); //Return the value as full url (converts relative urls to absolutes)

Other APIs

Some sites like Wikipedia or Archive.org provide a custom API that is used to fetch more reliable data. You can get the API object with the method getApi() but note that not all results have this method. The Api object has the same methods than oEmbed:

//Get the API object
$api = $info->getApi();

$api->all(); //Return all raw data
$api->get('title'); //Return a key value
$api->str('title'); //Return the value as string (remove html tags)
$api->html('html'); //Return the value as html
$api->int('width'); //Return the value as integer
$api->url('url'); //Return the value as full url (converts relative urls to absolutes)

Extending Embed

Depending of your needs, you may want to extend this library with extra features or change the way it makes some operations.

PSR

Embed use some PSR standards to be the most interoperable possible:

  • PSR-7 Standard interfaces to represent http requests, responses and uris
  • PSR-17 Standard factories to create PSR-7 objects
  • PSR-18 Standard interface to send a http request and return a response

Embed comes with a CURL client compatible with PSR-18 but you need to install a PSR-7 / PSR-17 library. Here you can see a list of popular libraries and the library can detect automatically 'laminas\diactoros', 'guzzleHttp\psr7', 'slim\psr7', 'nyholm\psr7' and 'sunrise\http' (in this order). If you want to use a different PSR implementation, you can do it in this way:

use Embed\Embed;
use Embed\Http\Crawler;

$client = new CustomHttpClient();
$requestFactory = new CustomRequestFactory();
$uriFactory = new CustomUriFactory();

//The Crawler is responsible for perform http queries
$crawler = new Crawler($client, $requestFactory, $uriFactory);

//Create an embed instance passing the Crawler
$embed = new Embed($crawler);

Adapters

There are some sites with special needs: because they provide public APIs that allows to extract more info (like Wikipedia or Archive.org) or because we need to change how to extract the data in this particular site. For all that cases we have the adapters, that are classes extending the default classes to provide extra functionality.

Before creating an adapter, you need to understand how Embed work: when you execute this code, you get a Extractor class

//Get the Extractor with all info
$info = $embed->get($url);

//The extractor have document and oembed:
$document = $info->getDocument();
$oembed = $info->getOEmbed();

The Extractor class has many Detectors. Each detector is responsible to detect a specific piece of info. For example, there's a detector for the title, other for description, image, code, etc.

So, an adapter is basically an extractor created specifically for a site. It can contains also custom detectors or apis. If you see the src/Adapters folder you can see all adapters.

If you create an adapter, you need also register to Embed, so it knows in which website needs to use. To do that, there's the ExtractorFactory object, that is responsible for instantiate the right extractor for each site.

use Embed\Embed;

$embed = new Embed();

$factory = $embed->getExtractorFactory();

//Use this MySite adapter for mysite.com
$factory->addAdapter('mysite.com', MySite::class);

//Remove the adapter for pinterest.com, so it will use the default extractor
$factory->removeAdapter('pinterest.com');

//Change the default extractor
$factory->setDefault(CustomExtractor::class);

Detectors

Embed comes with several predefined detectors, but you may want to change or add more. Just create a class extending Embed\Detectors\Detector class and register it in the extractor factory. For example:

use Embed\Embed;
use Embed\Detectors\Detector;

class Robots extends Detector
{
    public function detect(): ?string
    {
        $response = $this->extractor->getResponse();
        $metas = $this->extractor->getMetas();

        return $response->getHeaderLine('x-robots-tag'),
            ?: $metas->str('robots');
    }
}

//Register the detector
$embed = new Embed();
$embed->getExtractorFactory()->addDetector('robots', Robots::class);

//Use it
$info = $embed->get('http://example.com');
$robots = $info->robots;

Settings

If you need to pass settings to the CurlClient to perform http queries:

use Embed\Embed;
use Embed\Http\Crawler;
use Embed\Http\CurlClient;

$client = new CurlClient();
$client->setSettings([
    'cookies_path' => $cookies_path,
    'ignored_errors' => [18],
    'max_redirs' => 3,               // see CURLOPT_MAXREDIRS
    'connect_timeout' => 2,          // see CURLOPT_CONNECTTIMEOUT
    'timeout' => 2,                  // see CURLOPT_TIMEOUT
    'ssl_verify_host' => 2,          // see CURLOPT_SSL_VERIFYHOST
    'ssl_verify_peer' => 1,          // see CURLOPT_SSL_VERIFYPEER
    'follow_location' => true,       // see CURLOPT_FOLLOWLOCATION
    'user_agent' => 'Mozilla',       // see CURLOPT_USERAGENT
]);

$embed = new Embed(new Crawler($client));

If you need to pass settings to your detectors, you can add settings to the ExtractorFactory:

use Embed\Embed;

$embed = new Embed();
$embed->setSettings([
    'oembed:query_parameters' => [],  //Extra parameters send to oembed
    'twitch:parent' => 'example.com', //Required to embed twitch videos as iframe
    'facebook:token' => '1234|5678',  //Required to embed content from Facebook
    'instagram:token' => '1234|5678', //Required to embed content from Instagram
    'twitter:token' => 'asdf',        //Improve the data from twitter
]);
$info = $embed->get($url);

Note: The built-in detectors does not require settings. This feature is only for convenience if you create a specific detector that requires settings.


More Repositories

1

psr7-middlewares

[DEPRECATED] Collection of PSR-7 middlewares
PHP
672
star
2

node-sketch

💎 Javascript library to manipulate sketch files
JavaScript
306
star
3

imagecow

PHP library to manipulate and generate responsive images
PHP
240
star
4

simple-crud

PHP library to provide magic CRUD in MySQL/Sqlite databases with zero configuration
PHP
237
star
5

form-manager

PHP library to create and validate html forms
PHP
152
star
6

jquery-cheatsheet

jQuery interactive cheatsheet
CSS
137
star
7

awesome-design

A collection of open resources for web designers
97
star
8

social-links

Simple library to count shares and generate share buttons
PHP
95
star
9

env

Simple library to read environment variables and convert to simple types.
PHP
81
star
10

bookmarklets

Simple framework to build bookmarklets easily
JavaScript
68
star
11

keep-a-changelog

Node & Deno package to parse and generate changelogs
TypeScript
56
star
12

deno-cheatsheet

Deno cheat sheet with APIs and tools
CSS
53
star
13

stylecow

[deprecated] CSS preprocessor written in PHP
PHP
46
star
14

vento

🌬 A template engine for Deno & Node
TypeScript
44
star
15

middleland

Simple PSR-15 middleware dispatcher
PHP
34
star
16

jQuery.media

jQuery based library to manage video and audio html5 elements
JavaScript
30
star
17

php-server-manager

Manage PHP built-in server in node
JavaScript
28
star
18

semantic-html

Collection of semantic HTML use cases
23
star
19

inline-svg

Insert svg in the html so you can use css to change the style
PHP
19
star
20

css-style-guide

My own css style guide
HTML
15
star
21

nginx-snippets

Custom snippets for nginx
14
star
22

html-parser

Simple utility to parse html strings to DOMDocument
HTML
13
star
23

fly-crud

Basic crud system built on top of flysystem
PHP
12
star
24

nodedeno

Script to convert Node libraries to Deno
JavaScript
11
star
25

folk

Universal CMS to use with any web
PHP
10
star
26

html

PHP library to generate HTML code
PHP
9
star
27

gpm

Git-based package manager for Deno
TypeScript
9
star
28

uploader

Basic php library to upload files
PHP
7
star
29

d.js

DOM manipulation micro library (~4Kb)
JavaScript
7
star
30

dbin

Library to download binary files from GitHub releases detecting the correct platform.
TypeScript
7
star
31

typofixer

Fix microtypography issues in html code
PHP
7
star
32

server

Simple class to emulate Apache's "mod_rewrite" functionality from the built-in PHP web server
PHP
5
star
33

psr7-unitesting

Test your psr-7 http messages easily
PHP
5
star
34

server-style-guide

Step-by-step instructions to install and configure a web server
CSS
5
star
35

polyfills

List of polyfills to use modern things safely
4
star
36

awesome-talks

Collection of design talks in galician and spanish
4
star
37

zume

A static-site generator built on top of gulp.
JavaScript
4
star
38

memes-da-vida

Xerador de memes con debuxos de Castelao
HTML
3
star
39

vscode-vento

Vento for Visual Studio Code
3
star
40

fol

Base app to build websites
PHP
3
star
41

ha

Código público de historia-arte.com
PHP
3
star
42

gist-runner

Simple script to run github gist files in localhost
JavaScript
3
star
43

jose

Feed reader
PHP
3
star
44

history-navigator

Minimalist js library to navigate across the browser history
JavaScript
3
star
45

jquery.lazyscript

Simple jquery plugin to load or transform elements in lazy mode
HTML
3
star
46

view-helpers

Collection of useful functions to use in your templates
PHP
2
star
47

form-manager-bootstrap

Simple FormManager extension to create bootstrap-like forms
PHP
2
star
48

simple-crud-extra-fields

Extra fields for simple-crud package
PHP
2
star
49

php-cs-fixer-config

My own custom php-cs-fixer config
PHP
2
star
50

matomo-tracker

Generate Matomo tracker urls that you can use to insert tracking images in your site
PHP
2
star
51

netlify_cms_config

Netlify CMS config generator
TypeScript
1
star
52

chipi-client

JavaScript
1
star
53

domplates

Easy HTML <template>
JavaScript
1
star
54

how-to-do-it

cli utility to help me to remember other cli commands
JavaScript
1
star
55

icona

1 svg + 1 css = multiple icons
CSS
1
star
56

designtokens

A Deno/Node library to parse, manipulate and transform design tokens
TypeScript
1
star