
What is XML?
XML does not DO anythingMaybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store and to send information. The following example is an address book entry for John Doe, stored as XML: <person> <firstname>John</firstname> <surname>Doe</surname> <age>24</age> <address type=work> <streetnumber>10</streetnumber> <street>The Street</street> <town>London</town> </address> </person> The address book entry has all the information contained in a defined manner that is relevant for John Doe. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive, understand or display it. XML is free and extensibleOpting for XML is a bit like choosing SQL for databases:you still have to build your own database and your own programs and procedures that manipulate it, but there are many tools available and many people who can help. And since XML is license-free, you can build your own software around it without paying anybody anything. The large and growing support means that you are also not tied to a single vendor. XML tags are not predefined. You must "invent" your own tags.The tags used to mark up HTML documents and the structure of a HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). XML allows the author to define there own tags and there own document structure. The tags in the example above (like <firstname> and <street>) are not defined in any XML standard. These tags are "invented" by the author of the XML document. XML is not a replacement for HTML.It is important to understand that XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML can be thought of as a cross-platform, software and hardware independent tool for transmitting information. How can XML be used?It is important to understand that XML was designed to store, carry, and exchange data. XML was not designed to display data. XML can Separate Data from displayWhen HTML is used to display data, the data is stored inside your HTML. With XML, data can be stored in separate XML files. This way you can concentrate on using a display dedicated language for display, and be sure that changes in the underlying data will not require any changes to your data structer. For example you could use XML data inside HTML, using HTML only for formatting and displaying the data and XML as the transport layer. XML is used to Exchange DataWith XML, data can be exchanged between incompatible systems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications and computer systems. XML can be used to Share DataWith XML, plain text files can be used to share data. Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data. This makes it much easier to create data that different applications can work with. It also makes it easier to expand or upgrade a system to new operating systems, servers and applications. XML can be used to Store DataWith XML, plain text files can be used to store data. XML can also be used to store data in files or in databases. Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data. XML can make your Data more UsefulWith XML, your data is available to more users. Since XML is independent of hardware, software and application, you can make your data available to third parties with little intervention from the author of the XML data (if a standard XML DTD/Schema is used then data exchange intervention from the generator is not an issue). Third party applications and users can access your XML files as data sources, like they are accessing databases. Your data can be made available to all kinds of "reading machines" (agents), and it is easier to make your data available for the blind, or people with other disabilities. If Developers have SenseThe future might give us word processors, spreadsheet applications and databases that can read each other's data in a pure text format, without any conversion utilities in between. Microsoft has started down this road with the release of its office sweat XML format, this is not to say they are leading the field though as both Star office (a cross platform free Microsoft compatible office sweet) and Apple OS X both heavily depend on XML as there preferred data format. There is nothing special about XMLThere is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets. Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially. In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application. XML SyntaxThe syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use. Because of this, creating software that can read and manipulate XML is very easy to do. An example XML documentXML documents use a self-describing and simple syntax. <?xml version="1.0" encoding="utf-8"?> <person> <firstname>John</firstname> <surname>Doe</surname> <age>24</age> <address type=work> <streetnumber>10</streetnumber> <street>The Street</street> <town>London</town> </address> </person> The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the utf-8 character set (a sub-set of the Unicode standard). The next line describes the root element of the document: <person> The next 8 lines describe 4 child elements of the root (firstname, surname, age, and address): <firstname>John</firstname> <surname>Doe</surname> <age>24</age> <address type=work> <streetnumber>10</streetnumber> <street>The Street</street> <town>London</town> </address> In the example the address tag also has three child elements (streetnumber, street, town) this type of data structure is called a hieratical or tree structure. And finally the XML file ends with the closing of the root element: </person> All XML elements must have a closing tagWith XML, it is illegal to omit the closing tag. In HTML some elements do not have to have a closing tag. The following code is legal in HTML: <p>This is a paragraph <p>This is another paragraph In XML all elements must have a closing tag, in order for them to be legal XML: <p>This is a paragraph</p> <p>This is another paragraph</p> NOTE You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself and is there as a hint to the interpreter of the XML as to what is to follow. XML tags are case sensitiveUnlike HTML, XML tags are case sensitive. With XML, the tag <Letter> is different from the tag <letter>. Opening and closing tags must therefore be written with the same case: <Message>This is incorrect</message> <message>This is correct</message> All XML elements must be properly nestedImproper nesting of tags makes no sense in XML an will make the document illegal. In HTML some elements can be improperly nested within each: <b><i>This text is bold and italic</b></i> In XML all elements must be properly nested within each other: <b><i>This text is bold and italic</i></b> All XML documents must have a root elementAll XML documents must contain a single tag pair to define a root element. All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element: <root> <child> <subchild>.....</subchild> </child> </root> Attribute values must always be quotedWith XML, it is illegal to omit quotation marks around attribute values. XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct: <?xml version="1.0" encoding="utf-8"?> <person> <address type=work> </address> </person> <?xml version="1.0" encoding="utf-8"?> <person> <address type=work> </address> </person> The error in the first document is that the type attribute in the address element is not quoted. This is correct: type="work" This is incorrect: type=work Comments in XMLThe syntax for writing comments in XML is similar to that of HTML. <!-- This is a comment --> XML ElementsThe main carrier of information in an XML document is an element. An element is a single unit of storage that has a role to play in the over all document structure. An element can contain data or other elements. To understand XML terminology, you have to know how relationships between XML elements are named, and how element content is described. Imagine that this is a table of contents: My First anatomy Book Introduction to the body. What does my blood do? What do my bones do? Putting it all together How do I breath? How do I walk? Imagine that this XML document describes the book: <toc> <title>My first anatomy Book</title> <meta date=2004-10-01><meta> <section> Introduction to the body. <paragraph>What is HTML</paragraph> <paragraph>What is XML</paragraph> </section> <section>Putting it all together <paragraph>How do I breath?</paragraph> <paragraph>How do I walk?</paragraph> </section> </toc> In the example toc is the root element, title, meta and section are child elements of toc. toc is the parent element of title, meta and section. title, meta and section are siblings (or sister elements) because they have the same parent. Elements have ContentElements can have different content types. An XML element is everything from (including) the element's start tag to (including) the element's end tag. An element can have element content, mixed content, simple content, or empty content. An element can also have attributes. In the example above, toc has element content, because it contains other elements. section has mixed content because it contains both text and other elements. paragraph has simple content (or text content) because it contains only text. meta has empty content, because it carries no information. Element Naming
Take care when you "invent" element names and follow these simple rules:
Non-English letters like éäé are perfectly legal in the XML specification, but not all software vendor support them. The colon (":") should not be used in element names because it is reserved to be used for something called namespaces (more later). XML AttributesXML elements can have attributes in the start tag, just like HTML. Attributes are used to provide additional information about elements. In HTML you can create tags like this: <IMG SRC="toc.gif"> The SRC attribute provides additional information about the IMG element. In HTML (and in XML) attributes provide additional information about elements: <img src="computer.gif"> <a href="demo.asp"> Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element: <file type="gif">computer.gif</file> Quote Styles, "female" or 'female'?Attribute values must always be enclosed in quotes, but either single or double quotes can be used. For a person's sex, the person tag can be written like this: <person sex="female"> or like this: <person sex='female'>
NOTE
If the attribute value itself contains double quotes it is necessary to use single quotes, like in this example:
NOTE
If the attribute value itself contains single quotes it is necessary to use double quotes, like in this example:
Use of Elements vs. AttributesData can be stored in child elements or in attributes. Take a look at these examples: <person sex="female"> <firstname>Anna</firstname> <surname>Smith</surname> </person> <person> <sex>female</sex> <firstname>Anna</firstname> <surname>Smith</surname> </person> In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information. There are no rules about when to use attributes, and when to use child elements. SummaryXML is not so much a language as a standardized set of rules for adding structure to any form of data using a system of markup tags. Anyone can create their own markup vocabulary (called an XML Schema), and XML ensures that the structure will be intelligible to anyone else who reads the XML Schema document. More importantly, referring to an XML Schema enables XML-aware software to automatically manipulate the data without needing advance knowledge of the structure. |