Introduction to XPath

Introduction to XPath

XPath can be thought of as a query language such as SQL. However, rather than extracting information from a database, it extracts information from an XML document. XPath is a language for retrieving from a database, it extracts information from an XML document. XPath is a language for retrieving information from a XML document. XPath is used to navigate through elements and attributes in an XML document. Thus, XPath allows identifying parts of an XML document.

XPath provides a common syntax as shown in figure:-

XSLT: XSLT is a language for transforming XML documents into XML, HTML, or text.
XQuery: XQuery builds on XPath and is a language for extracting from XML documents.

Benefits of XPath

XPath is desingned for XML documents. It provides a single syntax that you can use for queries, addressing, and patterns. XPath is concise, simple, and powerful.

XPath has many benefits:

Syntax is simple for the simple and common cases
Any path that can occur in an XML document and any set of conditions for the nodes in the path can be specified.
Any node in an XML document can be uniquely identified

XPath is designed to be used in many contexts. It is applicable to providing links to nodes, for searching repositories, and for many other applications.

XML Using Xpath Program

XML and XPath Program: Xpath is called for finding the information within the document.

What is XPath?

XPath is a syntax to define the parts of XML document.
XPath uses the path expressions to navigate in XML documents.
XPath contains a library of the standard functions.
XPath is a major element in XSLT.
XPath is also used in XQuery, XPointer and XLink.
XPath is a W3C recommendation.

Softwares

XML copy Editor, XEditor.

Program

<book>
<title>XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>300</price>
</book>

Xpath

/book

output

XML

Erik T. Ray

2003

300

Xpath

/book/author

output

Erik T. Ray

Xpath Expressions and Functions

XPath Expressions

XPath Expressions are statements that can extract useful information from the XPath tree. Instead of just finding nodes, one can count them, add up numeric values, compare strings, and more. They are much like statements in a functional programming language. Every XPath expression evaluates to a single value.

There are four types of expressions in XPath. They are:

Node-set: A node-set is an unordered group of nodes from the input document that match an expression’s criteria.
Boolean: A Boolean has one of two values: true of false. XSLT allows any kind of data to be transformed into a Boolean. This is often done implicity when a string or a number or a node-set used where a Boolean is expected.
Number: XPath numbers are numeric values useful for counting nodes and performing simple arithmetic. The numbers such as 43 or -7000 that look like integers are stored as doubles. Non-number values, such as strings and Booleans, are converted to numbers automatically as necessary.
String: A string is a sequence of zero or more Unicode characters. Other data types can be converted to strings using the string() function.

XPath Funcitons

XPath defines various functions required for XPath 2.0, XQuery 1.0 and XSLT 2.0. The different functions and Accessor, AnyURI, Node, Error and Trace, Sequence, Context, Boolean, Duration/Date/Time, String, QName and Numeric.

XML Path Language (XPath) functions can be used to refine XPath queries and enhance the programming power and flexibility of XPath. Each function in the function library is specified using a function prototype that provides the return type, function name, and argument type. If an argument type is followed by a question mark, the argument is optional; otherwise, the argument is required. Function names are case-sensitive.

The default prefix for the function namespace is fn.

Benefits of XPath

Benefits of XPath

XPath is designed for XML documents. It provides a single syntax that you can use for queries, addressing, and patterns. XPath is concise, simple, and powerful.

XPath has many benefits:

Syntax is simple for the simple and common cases.
Any path that can occur in an XML document and any set of conditions for the nodes in the path can be specified.
Any node in an XML document can be uniquely identified.

XPath is designed to be used in many contexts. It is applicable to providing links to nodes, for searching repositories, and for many other applications.

XML Document in XPath

In XPath, an XML document a viewed conceptually as a tree in which each part of the document is represented as a node as shown in figure below:

XPath has seven types of nodes. They are:

Root: The XPath tree has a single root node, which contains all other nodes in the tree.
Element: Every element in a document has a corresponding element node that appears in the tree under the root node. Within an element node appear all of the other types of nodes that corresponding to the element’s content. Element nodes may have a unique identifier associated with them that is used to reference the node with XPath.
Attribute: Each element node has an associated set of attribute nodes, the element is the parent of each of these attribute nodes, however, an attribute node is not a child of its parent element.
Text: Character data is grouped into text nodes. Characters inside comments, processing instructions and attribute values do not produce text nodes. The text node has a parent node and it may the child node too.
Comment: There is a comment node for every comment, except for any comment that occurs within the documents type declaration. The comment node has a parent node and it may be the child node too.
Processing instruction: There is a processing instruction node for every processing instruction, except for any processing instruction that occurs within the document type declaration. The processing instruction node has a parent node and it may be the child node too.
Namespace: Each element has an associated set of namespace nodes. Although the namespace node has a parent node, the namespace node is not considered a child of its parent node because they are not contained in a parent node, but are used to provide descriptive information about their parent node.

Introduction To XML Trees

XML document has a single root node.
The tree is a general ordered tree.
A parent node may have any number of children.
Child nodes are ordered and may have siblings.
Preorder traversals are usually used to get the information, out of the tree.

Simple XML Document

<?xml version = “1.0” ?> <address>
<name>
<first>Alice</first></br>
<last>Lee</last></br>
</name>
<email>alee@aol.com</email></br>
<phone>123-45-6789</phone></br>
<birthday>
<year>1983</year></br>
<month>07</month></br>
<day>15</day>
</birthday>
</address>

Program Demo

Write the code from XMLCopyEditor.

Save Any Location(EX:Sample.xml).

code

Output

Introduction To XML

What is XML?

XML stands for Extensible Markup Language.
XML is a markup language much like HTML.
XML was designed to carry the data but not to display it.
XML tags are not predefined. You must define your own tags.
XML is designed to be self-descriptive.

Why XML is popular?

Our machines are now only capable of processing requirements of this data format.
It supports data processing, data storage, and bandwidth requirements for the exchange of XML documents.
Driving force for the use of a technology, like XML, is the desire to exchange information in Open Systems or Open Software.
Development of the internet.

XML

XML is text (Unicode) based; Takes up less space; Can be transmitted efficiently.
One XML document can be displayed differently in different media, like HTML, video, CD, DVD. You only have to change the XML document in order to change all the rest.
XML documents can be modularized and its parts can be reused.

SGML (Standard Generalized Markup Language)

Forefather of all markup languages.
In 1969, it Introduced the notion that data processing and document processing could be one and the same thing.
Introduced the notion of a generalized document format.
SGML specification can communicate between systems.
Provides DTD specification to improve the standard of the document.

Example of an HTML Document

<html>
<head><title>Example</title></head.
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>

OUTPUT

Write the HTML code in notepad and save it with .html extension(EX:sample.html). Click the file to run in the browser.

OUTPUT

Example

<?xml version=“1.0”/>
<mymessage>
<message> Welcome to XML </message>
</mymessage>

An XML document contains one root element and its child elements.

Example of an XML Document

<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>

Demo

The XML code will be written in the XML copy editor. Save it at any location (EX:text.xml). Click it and open in the browse.

code

output(The XML program output)

Difference Between HTML and XML

HTML tags have a fixed meaning and browsers know what it is while the XML tags are different for different applications, and users know what they mean.
HTML tags are used for display while the XML tags are used to describe the documents and the data.

Benefits of XML

Simplifies Data Sharing.
Simplifies Data Transport.
Simplifies Platform Changes.
Separates Data from HTML.
Makes Your Data More Available.
Represents the information and the metadata about the information.
XML is referred as future-proof or loosely coupled, since it has the capability of separating process and data content.
XML is used to create new internet languages.

Well-Formed Documents

An XML document is said to be well-formed if it follows all the rules.
An XML parser is used to check that all the rules have been obeyed.
Parser is a software to process XML Document.
It reads the XML Document, Checks its syntax, reports errors and allows programmatic access to documents contents.
XML document is considered well formed if the syntax is correct.
Single root, start and end tag, attribute values in quotes.
Recent browsers such as Internet Explorer 5 and Netscape 7 come with XML parsers.
Parsers are also available for free download over the Internet. One is Xerces, from the Apache open-source project.
Java 1.4 also supports an open-source parser.

Advantages of xml over sgml

Though XML is using most of the functionality in SGML, it provides a number of distinct advantages.
XML permits well-formed documents to be parsed without the need for a DTD, whereas SGML implementations require some DTD for processing
XML is much simpler and more permissive in its syntax than SGML.
Implementation of SGML over the internet is more difficult than in XML.

Advantages of XML over HTML (and differences)

XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to transport and store data, with focus on what data is.
HTML was designed to display data, with focus on how data looks.
HTML is about displaying information, while XML is about carrying information.

Advantages of xml over EDI(Electronic Data Interchange)

XML requires less cost for data transaction and maintenance than EDI (which uses Millions of dollars for transactions).
XML uses Internet for data exchange whereas EDI over Internet does not meet much success.
XML has many built in components like validity checking, data mapping, Extensible style sheet etc.,
XML supports internationalization and localization but EDI doesn’t provide it.

Drawbacks of XML

XML is huge – takes lot of space to represent data (3 to 20 times greater than file formats).
XML editors often lack the detail and helpfulness found in common EDI editors.

Validity

A well-formed document has a tree structure and obeys all the XML rules.
A particular application may add more rules in either a DTD (document type definition) or in a schema.
Many specialized DTDs and schemas have been created to describe particular areas.
These range from disseminating news bulletins (RSS) to chemical formulas.
DTDs were developed first, so they are not as comprehensive as schema.

Thanks for reading.

XML Creation using LINQ to XML

There are many different techniques to use by which you can create an XML document in C#. One of them is LINQ to XML which we are going to discuss in this article.

Let’s say we need to create an XML as below:

<?xml version="1.0" encoding="UTF-8"?>
<Parent>
<Header>
<FileDetails>
<FileName>RandomFile</FileName>
<FileVersion>1.0</FileVersion>
</FileDetails>
</Header>
<Body>
<Infos>
<Info Type="Information1">This is Information1</Info>
<Info Type="Information2">This is Information2</Info>
</Infos>
<Users>
<UserDetails>
<Name>
<FirstName>Vipul</FirstName>
<MiddleName/>
<LastName>Malhotra</LastName>
</Name>
<DateOfBirth>12-Apr-1990</DateOfBirth>
</UserDetails>
</Users>
</Body>

Let’s break the creation of the file in two parts so as to be able to see more features.

We will first create the below Xml:

<?xml version="1.0" encoding="UTF-8"?>
<Parent>
<Header>
<FileDetails>
<FileName>RandomFile</FileName>
<FileVersion>1.0</FileVersion>
</FileDetails>
</Header>
<Body>
<Infos>
<Info Type="Information1">This is Information1</Info>
<Info Type="Information2">This is Information2</Info>
</Infos>
</Body>

In order to create this, we will first define an XDocument with the parent root as below:

XDocument doc = new XDocument(new XElement("Parent"));

After this, we will use this “doc” as the root of the file and will writing nested XElement to it.

Let’s first create the Header portion of the xml.
Header
Please notice that the XElement “Header “ is added as a new element and the further elements are added as nested to this “Header” element. It is due to the reason that the elements are sub-elements of “Header”. Further “FileName” and “FileVersion” element is a sub-element of “FileDetails”

In the same way, we would add another section to the root of the doc. This section would be “Body”.

The code for the same would be as:
code
This follows the same logic that “Body” is also sub-node of the root “parent” and so it is added directly to the root. Whereas , the element “Infos” is sub-element of “Body” and is so added in the way above. Same goes for “Info” which is a further sub-element of “Infos”.

Also notice how an attribute is added to each of the “Info” element using XAttribute.

After this, we further need to add the below section as sub-nodes of “Body” and not the root of the application:

<Users>
<UserDetails>
<Name>
<FirstName>Vipul</FirstName>
<MiddleName/>
<LastName>Malhotra</LastName>
</Name>
<DateOfBirth>12-Apr-1990</DateOfBirth>
</UserDetails>
</Users>

In order to do that, we would make sure that the code starts appending the code inside the “Body” tag of the already created xml.

Using XDocument, we can search for the node “Body” and then start adding node XElements to it .
node
Searching a node Is done using:

Further adding more elements to it is done using the below code:
code
The logic behind the hierarchy is the same as that discussed above.

The code can also be used inside a loop in case we need to add many similar sections to a particular node. Like in this case there can be many users and all of their details would have to be added in different UserDetails section inside the “Body” node.

Introduction to XPath

Your comment on this post:

Related Articles