在使用 PHP 处理 XML 数据时,DOMDocument 和 DOMXPath 是两个非常强大的工具。尤其是在需要进行复杂 XML 查询的时候,利用命名空间与 XPath 表达式可以大大提升灵活性与准确性。本文将介绍如何结合 registerXPathNamespace 方法和自定义 XPath 函数,来扩展和增强你的 XML 查询功能。
在许多 XML 文档中,元素和属性通常带有命名空间前缀。例如:
<span><span><span class="hljs-tag"><<span class="hljs-name">root</span></span></span><span> </span><span><span class="hljs-attr">xmlns:h</span></span><span>=</span><span><span class="hljs-string">"http://www.w3.org/TR/html4/"</span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">h:table</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">h:tr</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">h:td</span></span></span><span>>Apples</span><span><span class="hljs-tag"></<span class="hljs-name">h:td</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">h:td</span></span></span><span>>Bananas</span><span><span class="hljs-tag"></<span class="hljs-name">h:td</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">h:tr</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">h:table</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">root</span></span></span><span>>
</span></span>
要正确查询如 <h:td> 这样的节点,XPath 表达式需要知道前缀 h 对应的命名空间 URI。这就是 registerXPathNamespace 派上用场的时候。
<span><span><span class="hljs-variable">$xml</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMDocument</span></span><span>();
</span><span><span class="hljs-variable">$xml</span></span><span>-></span><span><span class="hljs-title function_ invoke__">loadXML</span></span><span>(</span><span><span class="hljs-variable">$yourXmlString</span></span><span>);
</span><span><span class="hljs-variable">$xpath</span></span><span> = </span><span><span class="hljs-keyword">new</span></span><span> </span><span><span class="hljs-title class_">DOMXPath</span></span><span>(</span><span><span class="hljs-variable">$xml</span></span><span>);
</span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">registerNamespace</span></span><span>(</span><span><span class="hljs-string">"h"</span></span><span>, </span><span><span class="hljs-string">"http://www.w3.org/TR/html4/"</span></span><span>);
</span><span><span class="hljs-variable">$tds</span></span><span> = </span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">query</span></span><span>(</span><span><span class="hljs-string">"//h:td"</span></span><span>);
</span></span>
虽然 XPath 语言本身功能强大,但在某些业务逻辑场景下,内建的 XPath 函数可能无法满足需求。例如,我们可能想通过一个函数来判断某个元素的值是否在数据库或数组中出现,或者进行更复杂的字符串处理。
PHP 原生并不支持向 XPath 注入自定义函数,但可以通过变通方式,例如在遍历查询结果时调用 PHP 函数实现逻辑,或者结合 libxml 和 XSLTProcessor 来部分模拟该功能。
但更常见的做法是,在使用 XPath 查询前,先注册命名空间并预处理 XML,再配合自定义函数操作节点集。
假设你有如下 XML 文档,包含多个带命名空间的 <item> 节点:
<span><span><span class="hljs-tag"><<span class="hljs-name">catalog</span></span></span><span> </span><span><span class="hljs-attr">xmlns:bk</span></span><span>=</span><span><span class="hljs-string">"http://example.com/book"</span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:item</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:title</span></span></span><span>>PHP 编程</span><span><span class="hljs-tag"></<span class="hljs-name">bk:title</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:price</span></span></span><span>>45</span><span><span class="hljs-tag"></<span class="hljs-name">bk:price</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">bk:item</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:item</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:title</span></span></span><span>>Java 编程</span><span><span class="hljs-tag"></<span class="hljs-name">bk:title</span></span></span><span>>
</span><span><span class="hljs-tag"><<span class="hljs-name">bk:price</span></span></span><span>>55</span><span><span class="hljs-tag"></<span class="hljs-name">bk:price</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">bk:item</span></span></span><span>>
</span><span><span class="hljs-tag"></<span class="hljs-name">catalog</span></span></span><span>>
</span></span>
你希望查询所有价格低于 50 元的图书标题。XPath 本身支持数字比较:
<span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">registerNamespace</span></span><span>(</span><span><span class="hljs-string">"bk"</span></span><span>, </span><span><span class="hljs-string">"http://example.com/book"</span></span><span>);
</span><span><span class="hljs-variable">$nodes</span></span><span> = </span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">query</span></span><span>(</span><span><span class="hljs-string">"//bk:item[bk:price < 50]/bk:title"</span></span><span>);
</span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$nodes</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$node</span></span><span>) {
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$node</span></span><span>->nodeValue . </span><span><span class="hljs-string">"\n"</span></span><span>;
}
</span></span>
如果你要实现更复杂的逻辑(如价格低于 50 且标题包含“PHP”),可以结合多个 XPath 条件:
<span><span><span class="hljs-variable">$nodes</span></span><span> = </span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">query</span></span><span>(</span><span><span class="hljs-string">"//bk:item[bk:price < 50 and contains(bk:title, 'PHP')]/bk:title"</span></span><span>);
</span></span>
但若逻辑更加复杂,如“只保留标题包含关键词表中任意一个词的节点”,则需在查询后配合自定义函数过滤结果:
<span><span><span class="hljs-variable">$keywords</span></span><span> = [</span><span><span class="hljs-string">'PHP'</span></span><span>, </span><span><span class="hljs-string">'MySQL'</span></span><span>, </span><span><span class="hljs-string">'Laravel'</span></span><span>];
</span><span><span class="hljs-variable">$nodes</span></span><span> = </span><span><span class="hljs-variable">$xpath</span></span><span>-></span><span><span class="hljs-title function_ invoke__">query</span></span><span>(</span><span><span class="hljs-string">"//bk:item/bk:title"</span></span><span>);
</span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$nodes</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$node</span></span><span>) {
</span><span><span class="hljs-keyword">foreach</span></span><span> (</span><span><span class="hljs-variable">$keywords</span></span><span> </span><span><span class="hljs-keyword">as</span></span><span> </span><span><span class="hljs-variable">$word</span></span><span>) {
</span><span><span class="hljs-keyword">if</span></span><span> (</span><span><span class="hljs-title function_ invoke__">stripos</span></span><span>(</span><span><span class="hljs-variable">$node</span></span><span>->nodeValue, </span><span><span class="hljs-variable">$word</span></span><span>) !== </span><span><span class="hljs-literal">false</span></span><span>) {
</span><span><span class="hljs-keyword">echo</span></span><span> </span><span><span class="hljs-variable">$node</span></span><span>->nodeValue . </span><span><span class="hljs-string">"\n"</span></span><span>;
</span><span><span class="hljs-keyword">break</span></span><span>;
}
}
}
</span></span>
虽然这是在 XPath 查询结果外部进行的逻辑处理,但本质上就是将 XPath 查询能力和 PHP 自定义逻辑结合。
如果你需要真正将 PHP 函数嵌入 XPath 逻辑中,可以考虑通过 XSLTProcessor 和 PHP 扩展函数结合。例如,通过注册扩展函数实现逻辑判断(如 php:functionString('your_function'))——这属于更高阶的用法,需要在 PHP 中启用 xsl 扩展,并注意安全性。