...

/

Parsing HTML Attributes and Parameters

Parsing HTML Attributes and Parameters

Learn and practice how to implement a fragmented attribute parser.

Implementing the FragmentAttributeParser class

Parsing the parameters and attributes of our fragments will be similar to extracting the HTML fragments themselves. Instead of checking for characters such as the greater-than and less-than characters, we will use the presence of whitespace to determine attribute and parameter boundaries. Because parameter values can contain valid whitespace, we will use the same technique to skip over strings when parsing parameters to advance our parser over those problematic areas. Our implementation of this can be found below:

Press + to interact
<?php
class FragmentAttributeParser extends BaseFragmentParser
{
public function parse($fragment)
{
$this->resetState();
$this->string = new Utf8StringIterator(
$fragment->innerContent->content
);
$tempAttributes = [];
$attributes = [];
for ($i = 0; $i < count($this->string); $i++) {
$this->checkCurrentOffsets($i);
if (ctype_space($this->current)) {
$this->buffer = '';
continue;
}
if ($this->isStartOfString()) {
$i = $this->scanToEndOfString($i);
$this->checkCurrentOffsets($i);
} else {
$this->buffer .= $this->current;
}
if ($this->next == null || ctype_space($this->next)) {
$tempAttributes[] = [
$this->buffer, $i
];
$this->buffer = '';
continue;
}
}
foreach ($tempAttributes as $tempAttribute) {
$attribute = new FragmentAttribute();
$attribute->content = $tempAttribute[0];
// Calculate the attribute's start and end
// positions relative to the original doc.
$attribute->endPosition = $tempAttribute[1] +
$fragment->innerContent->startPosition;
$attribute->startPosition = $attribute->endPosition -
str($attribute->content)->length() + 1;
// Extract name/values, if present.
$parts = str($attribute->content)->explode('=', 2);
if ($parts->count() == 2) {
$attribute->type = AttributeType::Parameter;
$attribute->name = $parts->first();
$attribute->value = $parts->last();
} else {
$attribute->name = $attribute->content;
$attribute->type = AttributeType::Attribute;
}
$attributes[] = $attribute;
}
return $attributes;
}
}

Like our fragments parser, much of what is happening in the above code is similar to what we’ve seen before. The more notable differences can be found between lines 18 and 21; within these lines, if the current character is whitespace, we clear the contents of our internal buffer and move to the next position within the ...