XML stands for Extensible Markup Language, a markup language that defines a set of rules to encode documents in a format that is both human and machine-readable.
Microsoft and IBM jointly developed the XML standard to provide a common language for data transportation between applications written in different computer languages. XML is an open standard managed by XML.com. XML was initially designed as a subset of the Standard Generalized Markup Language (SGML) to represent generic SGML documents. In 1998, XML 1.0 became the first widely accepted W3C (World Wide Web Consortium) recommendation.
XML is a textual data format with strong support for different human languages via Unicode. XML enables the exchange of data between applications written in various computer languages.
Typically, XML files are used to store data. They contain information that includes the structure and content of a document. Today, XML formats can be found in almost all programming languages, frameworks, and platforms that support XML data.
Below is a simple XML document.
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!this is a comment>
<element>this is xml data</element>
</root>
XML files are made up of two essential parts:
The prolog is a series of XML declarations that specify what version of XML you’re using, whether or not it’s an internal or external subset, and if any encoding schemes are being used within the XML file itself.
Below is a code snippet of an XML prolog:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE contacts[<!ELEMENT contacts(name, phone*, address*)>]>
In the XML prolog above, version="1.0"
specifies that XML version 1.0 is being used. encoding="UTF-8"
specifies that the XML file’s encoding scheme is UTF-8.
Below is the code snippet for the comment:
<!-- this is a comment -->
XML comments may appear anywhere in an XML file, but XML comments are not considered part of the document’s content.
XML document elements contain:
XML elements define the meaning of a data format. XML elements must have an opening tag (<tagname>
) and a closing tag (</tagname>
) delimited by angle brackets, and tagname must be XML-compliant. Elements can contain attributes, content, or character data. Elements may also have children elements between an opening and closing tag. Element names must begin with a letter ([a-z]
), followed by zero or more letters, digits ([0-9]
), hyphens(-
), underscores(_
) or colons(:
). Element names are case-sensitive and must not have XML document type declarations as children. Element names cannot be reserved for XML words or any word used in XML specifications for future versions of XML.
Below is the code snippet of an XML element:
<element>this is xml data</element>
Attributes are used to provide metadata about the XML element or document as a whole. XML character data is used within an XML element’s content to specify the text of the XML element itself.
Attributes are name-value pairs within opening tag delimiters of elements that define information about the element’s content instead of defining the meaning of the XML document. XML attributes can contain XML character data, but XML attribute names must begin with a letter ([a-z]
), followed by zero or more letters, digits ([0-9]
) or underscores (_
).
Below is the code snippet for the XML attribute:
<element attribute="value"></element>
XML is a very important data serialization standard. Every software engineer should learn about XML, how to work with it, and its related tools.