2008/12/16

XML find and replace - xmlsed how to

Sed is a great tool when it comes to string replacement. With simple command like:

$ sed -i -e 's/log4j\.rootLogger=debug/log4j\.rootLogger=warn/' log4j.properties

, it will change appropriate value in log4j.properties file. It is especially useful in automatic scripts which customize configuration.

The problem I have encountered lately was connected with using sed for string replacement in xml files. I tried different regular expressions, and multi-line matching, and it was a real pain. I needed a kind of "xmlsed" in fact. Then I realized that even I don't know sed script syntax good enough, I know language which is tailored at manipulation of XML, which is XSLT. :)

For example when we have xml log4j configuration like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
  <appender name="console" class="org.apache.log4j.ConsoleAppender"> 
    <param name="Target" value="System.out"/>
    <layout class="org.apache.log4j.PatternLayout"> 
      <param name="ConversionPattern" value="%-5p %c{1} - %m%n"/> 
    </layout>
  </appender>

  <root>
    <priority value ="debug" />
    <appender-ref ref="console" />
  </root>
  
</log4j:configuration>
We have to prepare appropriate filtering xslt file:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:template match="root/priority/@value">
    <xsl:attribute name="value">warn</xsl:attribute>
  </xsl:template>

  <xsl:template match="@*|*">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="comment()">
     <xsl:copy />
  </xsl:template>

</xsl:stylesheet>

, and then call:

$ xsltproc -o log4j.xml filter.xsl log4j.xml 

The XSLT file could look a bit verbose, but in fact only first template match is specific. The rest will just copy xml from input to output. Using this technique we can also strip some attributes or semantically replace more structured XML fragments. And all of this without removing XML comments.

The only drawback is that DOCTYPE will not be preserved.

If you want to match specific attribute value:

  <xsl:template match="Connector/@port[.='8080']">
    <xsl:attribute name="port">8180</xsl:attribute>
  </xsl:template>