A Simple PHP Tokenizer

Learn and practice how to tokenize a simple PHP code and cursor implementation.

Tokenizing PHP code

Let’s consider the example in the code below. Our example uses PHP’s token_get_all function to return the results of PHP’s tokenizer on the input string. We then use Laravel’s collection features to provide a friendlier name for each of the returned tokens:

Press + to interact
Press + to interact
<?php
$code = <<<'PHP'
<?php
$value = 5 + (532_323) - $total;
PHP;
collect(token_get_all($code))->map(function ($token) {
if (!is_string($token) && count($token) > 1) {
$token['name'] = token_name($token[0]);
}
return $token;
})->all();

Our example would produce the following output:

Press + to interact
Array
(
[0] => Array
(
[0] => 389
[1] => 1
[name] => T_OPEN_TAG
)
[1] => Array
(
[0] => 266
[1] => $value
[2] => 2
[name] => T_VARIABLE
)
[2] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[3] => =
[4] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[5] => Array
(
[0] => 260
[1] => 5
[2] => 2
[name] => T_LNUMBER
)
[6] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[7] => +
[8] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[9] => (
[10] => Array
(
[0] => 260
[1] => 532_323
[2] => 2
[name] => T_LNUMBER
)
[11] => )
[12] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[13] => -
[14] => Array
(
[0] => 392
[1] =>
[2] => 2
[name] => T_WHITESPACE
)
[15] => Array
(
[0] => 266
[1] => $total
[2] => 2
[name] => T_VARIABLE
)
[16] => ;
)

Each element in our resulting array corresponds to some of our input text. For instance, the second element corresponds to the $value variable on line 2. The first value in all of our nested arrays contains the token identifier, the second value holds the contents of the match, and the third value includes ...