how to use XML DOM API to go to every non-text nodes? – Education Career Blog

I am new to XML, and DOM. I guess I need to use DOM API to find go through every non-text nodes once, and output the node name.

say I got this example XML from W3C

<bookstore>

<book category="cooking">
 <title lang="en">Everyday Italian</title>
 <author>Giada De Laurentiis</author>
 <year>2005</year>
 <price>30.00</price>
 <page pagenumber="550"/>
</book>

<book category="children">
 <title lang="en">Harry Potter</title>
 <author>J K. Rowling</author>
 <year>2005</year>
 <price>29.99</price>
 <page pagenumber="500"/>
</book>
</bookstore>

I need to find node such as <page pagenumber="500" /> which is a non-text node

How can I do that? seduo-code would be fine too. Thanks

can I say

 while (x.nodeValue == NULL) {
   read the next node ?
}

I guess I should make myself clear, no assumption on any docuemnts. This should work on all XML as long as there is a non-text node. I guess this should be done in the order from top-down and from left to right for every nodes. 🙁

,

XPATH =”//*not(text())”
Will select all nodes which are non-text node.
Here in the given example: bookstore and book are also non-text nodes as they does not have any text of their own, though their children do have text.

,

Your question basically seems to be : Given an XML document, How do I find child nodes that do not have any text-content.

A simple XPath expression such as:

/bookstore/book/*count(child::text()) = 0

or

/bookstore/book/*not(text())

will do it for you. Applying this XPath expression on the sample document will return a node-set containing both the page elements. You do not have to know the name of the page element beforehand, or even the names of all possible child elements of the book element, as you can see.

To explain: You need to query for child-nodes of the book element that do not contain ANY textual child nodes. The child::* axis represents all child nodes of the current node and the text() node-type restricts the processed node types to those that contain textual content.

Edit: Note that if you want to query for non-text nodes in any XML document (as per your latest edit to the question), you should choose the answer provided by nils_gate. My answer was given prior to your edit and illustrates the concept, rather than providing a generic solution.

,

What do you know about the node you need to find? If you know exactly that it’s:

  • A page element
  • It has a pagenumber attribute with value 500

then XPath is the way forward (assuming it’s available on your platform – you haven’t specified beyond “DOM”; most DOM implementations include XPath as far as I’ve seen).

In this case you’d use an XPath of:

//[email protected]='500'

If you can’t use XPath, please explain which DOM API you’re using and we can try to come up with the best solution. Basically you’ll probably end up iterating over every element node, checking whether its name is page and then checking whether it has an appropriate pagenumber attribute value.

,

Looks like you’ll be needing an XPath. The W3 Schools site has a good reference, but, assuming the node always appears under a node, the XPath /bookstore/book/page will return a node set with each node in it. /bookstore/book/[email protected]='500' will get each node where the pagenumber attribute has a value of 500.

The // syntax will find the node anywhere in the document without worrying about structure – this can be easier but is slower, especially with large documents. If you have a document with a known structure, it’s best to use the explicit XPath.

Leave a Comment