Basics of HTML & Document Tree
In this lesson, we will go through the basics of HTML and DOM, which are needed for understanding XPath expressions.
We'll cover the following
What is HTML?
HTML stands for HyperText Markup Language. It’s a markup language used to create and structure the content of a web page, such as text, links, headings, paragraphs, etc.
The HTML files have .html
or .htm
extensions; you can view them using any web browser which reads the HTML files and renders its contents.
HTML is an XML document with predefined tags.
HTML components
Let’s understand the elements, tags, and attributes components using the HTML code example below.
<html><head><meta charset="UTF-8"><title>educative-xpath-demo</title></head><body><div id="xpath-content"><h1> HTML Example </h1><p>This HTML file is created to show the document tree example</p></div><div id="items"><ul style="list-style-type:circle"><li>Car</li><li>Bus</li><li>Truck</li></ul></div></body></html>
Elements
These are the building blocks of an HTML page, and are also referred to as nodes. A few of the elements/nodes shown in the above code are:
html
,body
,head
title
,div
,p
etc.
We can have nested elements, like:
<div id="items">
<ul style="list-style-type:circle">
<li>Car</li>
<li>Bus</li>
<li>Truck</li>
</ul>
</div>
Here, li
is inside ul
, which is nested inside the div
element.
Tags
The tag defines HTML elements, and it is represented within angular brackets < >
. As you can see in the example code above, there are 3 block-level element tags, <html>
<head>
and <body>
, and then there are other element tags, like <title>
, <div>
, <p>
, <ul>
, etc.
In general, an opening tag is followed by a closing tag, like <head> ..... </head>
.
Attributes
An attribute defines the properties of an HTML element. Here are some of the attribute examples from the above code:
-
<div id="xpath-content">
: In this case,id
is an attribute for thediv
. -
<ul style="list-style-type:circle">
: In this case,ul
has an attributestyle
which represents the items list using ‘circle’ bullet points.
About HTML Document Tree
Each HTML document can be represented in the tree format, where the elements can be described in a family-like hierarchy having ancestors, descendants, parents, children, and siblings.
As you can see, we have marked the elements/nodes in the above tree based on their positioning within the hierarchy:
-
Ancestors: An ancestor is a node that is connected further up the Document Tree w.r.t. to the context node at any higher levels. Example -
div
is an ancestor oful
andli
. -
Descendants: An descendant is a node that is connected lower up the document tree w.r.t. to the context node at any lower levels. Example -
div
is a descendant ofbody
node,li
is a descendant ofdiv
node. -
Siblings: Nodes at the same level that share the same parent node. Example -
li
nodes are siblings. Both thediv
nodes are siblings. -
Parent and child: are self-explanatory here.
More about HTML Document Tree and it’s usage in writing XPath expression in the XPath Axes lesson.