Using Laravel for Advanced String Manipulation in PHP/

...

Reversing Double Encoded HTML Entities

Learn and practice how to work with HTML entities using a crawler instance.

We'll cover the following...

Creating a new crawler instance
- Try it out
Implementing the reverseHtmlEncoding helper method
- Try it out

Broadly speaking, the questions relating to working with HTML usually fall into three categories: generating new HTML, manipulating existing HTML, and analyzing HTML-like documents with other languages embedded in them. The first category is one we are all generally familiar with because the output of most PHP applications is an HTML string, whether produced entirely in PHP or through some other purpose-built templating engine, such as Blade. However, things get particularly interesting and nuanced in the last two categories.

An example of manipulating existing HTML might be adding CSS classes to specific elements. The desire here is usually to express the targeted HTML elements using CSS’s selector syntax, which we will cover later in this chapter. Another example would be to reverse the double-encoding of HTML entities in the generated output, which can commonly occur accidentally with many layers of HTML output generated by content management systems.

The third category—analyzing HTML documents with other embedded languages—is the much more interesting of the three. The complexity within this category is that the embedded languages typically significantly make the overall document invalid HTML. Let’s consider the code below, which is a snippet of Antlers, the templating language for Statamic, a Laravel-based content management system:

Press + to interact

The exact semantics of what is happening in the code above are unimportant, except that we have some embedded code that will dynamically generate an HTML element’s name. These scenarios are relatively straightforward for us to visually analyze but add significant complexity when attempting to analyze the structure of our HTML documents using existing third-party libraries, which might not handle arbitrary embedded languages gracefully. This is not necessarily a problem with these libraries either, and their goal is to analyze HTML and ...

Introduction

What Are Strings?

Fluent Strings

The Formatting Helper Methods

The Logical Helper Methods

The Construction Helper Methods

The Extraction Helper Methods

Padding Strings

String Translations and Extension

Lines and Words

Applied Techniques: Writing a Gherkin Parser

Markov Chains and Text Generation

Fixed Width Data Parsing

Splitting Strings

Applied Techniques: A Blade Directive Validator

Working with HTML

Regular Expressions

Conclusion

Appendix

Reversing Double Encoded HTML Entities