Special Characters: Matching the URL and Non-Capturing Groups
Let's continue our example of capturing the hostname of a URL. Additionally, you will also be familiarized with non-capturing groups at the end of this lesson.
We'll cover the following
Regular expression to match sections of the URL
Finally, we’re looking for anything that comes after the first slash right after your hostname (i.e., after the .com). Again, we’re keeping it simple for now, so matching that should be quite simple. For the breakdown:
/\.com(\/[a-z\.]+)?/g
And for the breakdown:
- The mandatory .com at the start is clearly escaping the dot at the beginning.
- The optional condition for the URL is given by the last ? after the capturing group.
- The only characters we’re allowing for the URL are anything between a to z and the actual dot, so you can add things like index.html. Yes, this is an oversimplification but it should be enough to understand the concept.
Now that we’ve individually matched all parts of the URL, let’s put them together and try to understand where the non-capturing groups come into play.
Using non-capturing groups
Concatenating all three expressions together looks like this:
/(https?|ftp):\/\/(www\.)?([a-z-]+)\.com(\/[a-z\.]+)?/g
Testing the full expression on regex101.com will allow you to see how all three testing strings match! One of the great features of this tool is that, on the right side of the screen, you can see the three different matches.
Get hands-on with 1200+ tech skills courses.