• This repository has been archived on 19/Jan/2023
  • Stars
    star
    141
  • Rank 250,871 (Top 6 %)
  • Language
    PHP
  • License
    Other
  • Created about 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[DISCONTINUED] Source code tokenizer

Nette Tokenizer [DISCONTINUED]

Downloads this Month Tests Coverage Status Latest Stable Version License

Introduction

Tokenizer is a tool that uses regular expressions to split given string into tokens. What the hell is that good for, you might ask? Well, you can create your own languages!

Documentation can be found on the website. If you like it, please make a donation now. Thank you!

Installation:

composer require nette/tokenizer

It requires PHP version 7.1 and supports PHP up to 8.1.

Support Me

Do you like Nette Tokenizer? Are you looking forward to the new features?

Buy me a coffee

Thank you!

Usage

Let's create a simple tokenizer that separates strings to numbers, whitespaces, and letters.

$tokenizer = new Nette\Tokenizer\Tokenizer([
	T_DNUMBER => '\d+',
	T_WHITESPACE => '\s+',
	T_STRING => '\w+',
]);

Hint: In case you are wondering where the T_ constants come from, they are internal type used for parsing code. They cover most of the common token names we usually need. Keep in mind their value is not guaranteed so don't use numbers for comparison.

Now when we give it a string, it will return stream Nette\Tokenizer\Stream of tokens Nette\Tokenizer\Token.

$stream = $tokenizer->tokenize("say \n123");

The resulting array of tokens $stream->tokens would look like this.

[
	new Token('say', T_STRING, 0),
	new Token(" \n", T_WHITESPACE, 3),
	new Token('123', T_DNUMBER, 5),
]

Also, you can access the individual properties of token:

$firstToken = $stream->tokens[0];
$firstToken->value; // say
$firstToken->type; // value of T_STRING
$firstToken->offset; // position in string: 0

Simple, isn't it?

Processing the tokens

Now we know how to create tokens from string. Let's effectively process them using Nette\Tokenizer\Stream. It has a lot of really awesome methods if you need to traverse tokens!

Let's try to parse a simple annotation from PHPDoc and create an object from it. What regular expressions do we need for tokens? All the annotations start with @, then there is a name, whitespace and it's value.

  • @ for the annotation start
  • \s+ for whitespaces
  • \w+ for strings

(Never use capturing subpatterns in Tokenizer's regular expressions like '(ab)+c', use only non-capturing ones '(?:ab)+c'.)

This should work on simple annotations, right? Now let's show input string that we will try to parse.

$input = '
	@author David Grudl
	@package Nette
';

Let's create a Parser class that will accept the string and return an array of pairs [name, value]. It will be very naive and simple.

use Nette\Tokenizer\Tokenizer;
use Nette\Tokenizer\Stream;

class Parser
{
	const T_AT = 1;
	const T_WHITESPACE = 2;
	const T_STRING = 3;

	/** @var Tokenizer */
	private $tokenizer;

	/** @var Stream */
	private $stream;

	public function __construct()
	{
		$this->tokenizer = new Tokenizer([
			self::T_AT => '@',
			self::T_WHITESPACE => '\s+',
			self::T_STRING => '\w+',
		]);
	}

	public function parse(string $input): array
	{
		$this->stream = $this->tokenizer->tokenize($input);

		$result = [];
		while ($this->stream->nextToken()) {
			if ($this->stream->isCurrent(self::T_AT)) {
				$result[] = $this->parseAnnotation();
			}
		}

		return $result;
	}

	private function parseAnnotation(): array
	{
		$name = $this->stream->joinUntil(self::T_WHITESPACE);
		$this->stream->nextUntil(self::T_STRING);
		$content = $this->stream->joinUntil(self::T_AT);

		return [$name, trim($content)];
	}
}
$parser = new Parser;
$annotations = $parser->parse($input);

So what the parse() method does? It iterates over the tokens and searches for @ which is the symbol annotations start with. Calling nextToken() moves the cursor to the next token. Method isCurrent() checks if the current token at the cursor is the given type. Then, if the @ is found, the parse() method calls parseAnnotation() which expects the annotations to be in a very speficic format.

First, using the method joinUntil(), the stream keeps moving the cursor and appending the values of the tokens to the buffer until it finds token of the required type, then stops and returns the buffer output. Because there is only one token of type T_STRING at that given position and it's 'name', there will be value 'name' in variable $name.

Method nextUntil() is similar like joinUntil() but it has no buffer. It only moves the cursor until it finds the token. So this call simply skips all the whitespaces after the annotation name.

And then, there is another joinUntil(), that searches for next @. This specific call will return "David Grudl\n ".

And there we go, we've parsed one whole annotation! The $content probably ends with whitespaces, so we have to trim it. Now we can return this specific annotation as pair [$name, $content].

Try copypasting the code and running it. If you dump the $annotations variable it should return some similar output.

array (2)
   0 => array (2)
   |  0 => 'author'
   |  1 => 'David Grudl'
   1 => array (2)
   |  0 => 'package'
   |  1 => 'Nette'

Stream methods

The stream can return current token using method currentToken() or only it's value using currentValue().

nextToken() moves the cursor and returns the token. If you give it no arguments, it simply returns the next token.

nextValue() is just like nextToken() but it only returns the token value.

Most of the methods also accept multiple arguments so you can search for multiple types at once.

// iterate until a string or a whitespace is found, then return the following token
$token = $stream->nextToken(T_STRING, T_WHITESPACE);

// give me next token
$token = $stream->nextToken();

You can also search by the token value.

// move the cursor until you find token containing only '@', then stop and return it
$token = $stream->nextToken('@');

nextUntil() moves the cursor and returns the an array of all the tokens it sees until it finds the desired token, but it stops before the token. It can accept multiple arguments.

joinUntil() is similar to nextUntil(), but concatenates all the tokens it passed through and returns string.

joinAll() simply concatenates all the remaining token values and returns it. It moves the cursor to the end of the token stream

nextAll() is just like joinAll(), but it returns array of the tokens.

isCurrent() checks if the current token or the current token's value is equal to one of the given arguments.

// is the current token '@' or type of T_AT?
$stream->isCurrent(T_AT, '@');

isNext() is just like isCurrent() but it checks the next token.

isPrev() is just like isCurrent() but it checks the previous token.

And the last method reset() resets the cursor, so you can iterate the token stream again.

More Repositories

1

php-generator

🐘 Generates neat PHP code for you. Supports new PHP 8.3 features.
PHP
1,978
star
2

utils

πŸ›  Lightweight utilities for string & array manipulation, image handling, safe JSON encoding/decoding, validation, slug or strong password generating etc.
PHP
1,868
star
3

tracy

😎 Tracy: the addictive tool to ease debugging PHP code for cool developers. Friendly design, logging, profiler, advanced features like debugging AJAX calls or CLI support. You will love it.
PHP
1,712
star
4

nette

πŸ‘ͺ METAPACKAGE for Nette Framework components
PHP
1,514
star
5

latte

β˜• Latte: the safest & truly intuitive templates for PHP. Engine for those who want the most secure PHP sites.
PHP
1,044
star
6

finder

πŸ” Finder: find files and directories with an intuitive API.
931
star
7

neon

🍸 Encodes and decodes NEON file format.
PHP
879
star
8

robot-loader

πŸ€ RobotLoader: high performance and comfortable autoloader that will search and autoload classes within your application.
PHP
854
star
9

di

πŸ’Ž Flexible, compiled and full-featured Dependency Injection Container with perfectly usable autowiring and support for all new PHP 7 features.
PHP
841
star
10

schema

πŸ“ Validating data structures against a given Schema.
PHP
811
star
11

bootstrap

πŸ…± The simple way to configure and bootstrap your Nette application.
PHP
654
star
12

forms

πŸ“ Generating, validating and processing secure forms in PHP. Handy API, fully customizable, server & client side validation and mature design.
PHP
470
star
13

database

πŸ’Ύ A database layer with a familiar PDO-like API but much more powerful. Building queries, advanced joins, drivers for MySQL, PostgreSQL, SQLite, MS SQL Server and Oracle.
PHP
470
star
14

mail

A handy library for creating and sending emails in PHP
PHP
448
star
15

tester

Tester: enjoyable unit testing in PHP with code coverage reporter. 🍏🍏🍎🍏
PHP
440
star
16

http

🌐 Abstraction for HTTP request, response and session. Provides careful data sanitization and utility for URL and cookies manipulation.
PHP
437
star
17

caching

⏱ Caching library with easy-to-use API and many cache backends.
PHP
390
star
18

application

πŸ† A full-stack component-based MVC kernel for PHP that helps you write powerful and modern web applications. Write less, have cleaner code and your work will bring you joy.
PHP
384
star
19

security

πŸ”‘ Provides authentication, authorization and a role-based access control management via ACL (Access Control List)
PHP
338
star
20

component-model

βš› Component model foundation for Nette.
PHP
251
star
21

routing

Nette Routing: two-ways URL conversion
PHP
220
star
22

sandbox

142
star
23

safe-stream

SafeStream: atomic and safe manipulation with files via native PHP functions.
PHP
117
star
24

docs

πŸ“– The Nette documentation
115
star
25

web-project

Standard Web Project: a simple skeleton application using the Nette
Latte
102
star
26

reflection

[DISCONTINUED] Docblock annotations parser and common reflection classes
PHP
94
star
27

examples

πŸŽ“ Examples demonstrating the Nette Framework.
88
star
28

code-checker

βœ… A simple tool to check source code against a set of Nette coding standards.
PHP
85
star
29

web-addons.nette.org

[DISCONTINUED] Website https://addons.nette.org source code.
PHP
55
star
30

coding-standard

Nette Coding Standard code checker & fixer
PHP
40
star
31

command-line

⌨ Command line options and arguments parser.
PHP
37
star
32

type-fixer

πŸ†™ A tool to automatically update typehints in your code.
PHP
29
star
33

resources

Client-side resources for Nette Framework.
23
star
34

latte-tools

Twig & HTML to Latte converters
PHP
22
star
35

grunt-nette-tester

Grunt plugin for Nette Tester
JavaScript
20
star
36

middleware

PHP
20
star
37

deprecated

[DISCONTINUED] APIs and features removed from Nette Framework
PHP
19
star
38

safe

πŸ›‘ PHP functions smarten up to throw exceptions instead of returning false or triggering errors.
PHP
17
star
39

nette-minified

[DISCONTINUED] Minified version of Nette Framework.
PHP
16
star
40

tutorial-todo

[DISCONTINUED] Tutorial for simple task manager.
PHP
10
star
41

union

[READ-ONLY] Subtree union of Nette repositories
PHP
7
star
42

assistant

PHP
3
star
43

.github

1
star