XML parsing in PHP

We firstly need to create the parser:

$parser = xml_parser_create();

When the parser reads the XML it can call functions that we can then add code to handle. The code below makes the parser call function when a new element is found within an XML document, and when the end of an element is detected.

xml_set_element_handler($parser, "startElement", "stopElement"); 
xml_set_character_data_handler($parser, "characterData");

The last function we will create will be to handle the information within the XML element.


xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);

This section of code makes sure that the tag names returned by the parser are the original ones specified in the XML document, and have not been case-folded (put in upper case).

Callback functions

The parser will call the functions we have defined when parsing the XML so we now need to create those functions.

function startElement($parser, $name, $attrs)
{
   echo "Start [$name] -> Attributes = ";
   print_r($attrs);
}

function stopElement($parser, $name)
{
   echo "End [$name] \n";
}

function characterData($parser, $data)
{
   echo "Data: $data \n";
}

The functions above simply print the arguments that the parser sends. We finally need to parse the document. The use of an if statement will alert us if there is an error parsing the document.


if(!xml_parse($parser, str_replace(array("\n", "\r", "\t"), '', $xml)))
   echo xml_error_string(xml_get_error_code($parser));
Beware of newline characters in the $data attribute of the function characterData. These characters count as data but do not always need processing, they are often caused by the layout of XML i.e. putting new tags on new lines. This code will remove any newline and tab characters:
str_replace(array("\n", "\r", "\t"), "", $xml)

Changes to the code

The following code highlights the necessary changes to add XML parsing to your own object.

$this->parser = xml_parser_create(); 
xml_set_object($this->parser, $this);
xml_set_element_handler($this->parser, "startElement", "stopElement"); 
xml_set_character_data_handler($this->parser, "characterData"); 
xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, 0);

The object has the parser as one of its fields and refers to it through out the rest of the code.

$xml = new XMLParser();
//Enter xml code here
$text = '';
$xml->parse($text);

This example gets the XML from a variable and creates an XMLParser object to parse the XML.

The built in function xml_parse_into_struct parses the XML document and creates a multidimensional array that represents the XML document.

Examples

XML used in examples

The following XML will be used for the examples.


<?xml version="1.0" encoding="UTF-8"?>
<item productid="Item 1">
<id>Item 1</id>
<title fulltitle="Product 1">Product 1</title>
<description>A description of product 1.</description>
</item>

Basic parsing examples

Below is example output from the first piece of code.

Start [item] -> Attributes = Array
(
   [productid] => Item 1
)
Start [id] -> Attributes = Array
(
)
Data: Item 1
End [id]
Start [title] -> Attributes = Array
(
   [fulltitle] => Product 1
)
Data: Product 1
End [title]
Start [description] -> Attributes = Array
(
)
Data: A description of product 1.
End [description]
End [item]

PHP array example

The following code is produced from the xml_parse_into_struct code.

Array
(
   [0] => Array
   (
      [tag] => ITEM
      [type] => open
      [level] => 1
      [attributes] => Array
                   (
                      [PRODUCTID] => Item 1
                   )
   )
   [1] => Array
   (
      [tag] => ID
      [type] => complete
      [level] => 2
      [value] => Item 1
   )
   [2] => Array
   (
      [tag] => TITLE
      [type] => complete
      [level] => 2
      [attributes] => Array
                   (
                      [FULLTITLE] => Product 1
                   )
      [value] => Product 1
   )
   [3] => Array
   (
      [tag] => DESCRIPTION
      [type] => complete
      [level] => 2
      [value] => A description of product 1.
   )
   [4] => Array
   (
      [tag] => ITEM
      [type] => close
      [level] => 1
   )
)

Extending the code

The built in parser has other callback functions available for XML documents.

xml_set_default_handler($parser, 'functionName');
function functionName($parser, $data){}

This is the default handler function for the parser.

xml_set_unparsed_entity_decl_handler($parser, 'functionName');
function functionName($parser, $entityName, $base, $systemId, $publicId, $notationName){}

This function is called if the parser finds an external entity declaration.

xml_set_start_namespace_decl_handler($parser, 'functionName');
function functionName($parser, $userData, $prefix, $uri){}

Sets the function to be called when a namesapce is occurred. This handler is called before the start of tag handler when appropriate.

xml_set_end_namespace_decl_handler($parser, 'functionName');
function functionName($parser, $userData, $prefix){}

This function is called when the namespace no longer applies. It is called after the end of each appropriate tag.

xml_set_processing_instruction_handler($parser, 'functionName');
function functionName($parser, $target, $data){}

Sets the handler for a processing instruction within an XML document.

xml_set_external_entity_ref_handler($parser, 'functionName');
function functionName($parser, $entityName, $base, $systemId, $publicId, $notationName){}

This function is called if the parser finds an external entity declaration.

xml_set_notation_decl_handler($parser, 'functionName');
function functionName($parser, $notationName, $base, $systemId, $publicId){}

This function is called when a DTD is encountered.

Downloads

Categories

Tags

No tags

Social