Working with Embedded Languages
Learn how to work with embedded languages using an HTML fragment parser.
We'll cover the following...
We will work to build an HTML fragment and position analyzer. Our parser will not be concerned with creating the association between HTML elements, nor will it extract the content or sections of the document between the elements. Instead, we will focus entirely on where HTML elements appear and what features they can have, such as parameters or attributes. The intended use case for this parser will be to help make decisions from the results of other parsers.
For example, let’s look at the following Blade template, which dynamically constructs an HTML tag pair from a variable:
<{{ $element }} class="bg-white"></{{ $element }}>
It is relatively simple for us to read through and understand the intention behind the dynamic code. However, this task is much more difficult for HTML parsers or analysis systems. For instance, if we were to ignore the invalid characters at that location within an HTML document, would the tag name become {{
or {{ $element }}
? Questions like this make working HTML documents containing embedded languages particularly difficult, especially when we need to preserve our account for the embedded languages.
The system we will work through will allow us to reasonably quickly solve these problems and provide a foundation for new features. This system will be able to accept an input document containing HTML with any number of embedded languages and produce a list of parsed HTML nodes we can ask questions about to help with whatever our task is. Our HTML parser will be able to accept ranges within the document parser it should ignore.
When our specialized HTML parser encounters one of these regions, it will skip over them and continue with the source document. This is similar to our previous technique of filling a document with spaces and newlines with the vital difference being we do not remove information from the input document. These ranges will come from other parsers, such as a dedicated Blade parser, and allow our HTML parser to only concern itself with determining what is ...