When processing XML data in PHP, xml_parse() is a common function that helps us parse XML data into structured information. However, when faced with some irregular, incorrect format or unclosed XML data, this function can easily throw errors, causing the entire program to be interrupted or output exceptions.
So, how can we avoid parsing errors when using xml_parse() , especially when encountering irregular XML? Here are some practical tips and code examples.
Before using xml_parse() , it is recommended to always enable error detection so that specific error information is captured when parsing fails, rather than letting the program fail silently.
$xml = '<root><item>data1<item><item>data2</item></root>'; // irregularXML
$parser = xml_parser_create();
xml_set_error_code($parser, XML_ERROR_NONE);
if (!xml_parse($parser, $xml, true)) {
$errorCode = xml_get_error_code($parser);
$errorMsg = xml_error_string($errorCode);
$line = xml_get_current_line_number($parser);
$column = xml_get_current_column_number($parser);
echo "Parsing error:$errorMsg In the $line OK,1. $column List\n";
}
xml_parser_free($parser);
In order to parse irregular XML more safely, we can use the fault-tolerant mode of libxml to preprocess the data, and then hand it over to functions such as xml_parse() or simplexml_load_string() to process.
libxml_use_internal_errors(true);
$xml = '<root><item>data1<item><item>data2</item></root>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOERROR | LIBXML_NOWARNING | LIBXML_NONET | LIBXML_COMPACT | LIBXML_NOCDATA);
if (!$dom) {
foreach (libxml_get_errors() as $error) {
echo "LibXML mistake:{$error->message}\n";
}
libxml_clear_errors();
} else {
echo "XML Loading successfully,Can continue to parse。\n";
}
Sometimes, the data source is unstable, such as external APIs returning malformed XML (such as http://api.m66.net/data.xml ). In this case, we can first fix common problems such as unclosed tags or illegal characters using regular or manual methods.
$xml = file_get_contents('http://api.m66.net/data.xml');
// Simple fix:Replace unclosed tags
$xml = preg_replace('/<item>([^<]*)<item>/', '<item>$1</item><item>', $xml);
// 然后再进OK解析
$parser = xml_parser_create();
if (!xml_parse($parser, $xml, true)) {
echo "Still exists XML mistake,Try other ways to deal with it。\n";
}
xml_parser_free($parser);
?? Note: This method is suitable for scenarios where the format has certain expectations, and it is not recommended to use regular hard fixes for complex structural XML.
Sometimes changing a tool makes it more stable. For example, SimpleXML has stronger fault tolerance and easier code to maintain.
$xml = file_get_contents('http://api.m66.net/data.xml');
libxml_use_internal_errors(true);
$simpleXml = simplexml_load_string($xml);
if ($simpleXml === false) {
echo "SimpleXML Analysis failed,mistake如下:\n";
foreach (libxml_get_errors() as $error) {
echo $error->message;
}
} else {
echo "SimpleXML Successful analysis!\n";
}
Best practices for facing irregular XML when using xml_parse() :
Enable error reporting to facilitate locating problems;
Use libxml fault-tolerant parsing as preprocessing;
Use regular or DOMDocument to fix XML if necessary;
If the data format is uncontrollable, use better fault tolerance parsers such as SimpleXML .
Only after ensuring that the XML format is reasonable and using xml_parse() can the code be improved and the fault tolerance will be avoided from slacking the entire service due to format problems.
Hope these tips can help you process all kinds of "tricky" XML data! If you encounter specific parsing errors, you can post XML content, and I can also analyze it for you~
Related Tags:
xml_parse