Creating a Custom XSLT Function in Saxon HE

Posted by Tejus Parikh on July 23, 2010

In our XSTL workflow we make use of a lot of XPATH 2.0 features, such as it’s built-in regex support. Unfortunately, the default Java6 XML parsers only support XPATH 1.0. The library we settled on is Saxon HE, since it was free, supported the features we needed, and could be extended with Java functions. One of my tasks was to convert all relative paths in an XHTML document to absolute paths. The server prefix was set as variable in the stylesheet. The transformer would have to determine if the selected path is a relative url, then work to resolve what the absolute path is based on the root passed into the page. It could be done with an advanced XSTL template, but since we already had the resolution function written in Java, it made more sense to write a Java plugin to Saxon. One of the missing features of Saxon HE is the seamless, reflection based integration of plugins. However, one can use the Extension Function API to achieve the same results. First on the agenda is creating an extension point:


package net.vijedi.saxon.extensions;



import net.sf.saxon.expr.StaticProperty;

import net.sf.saxon.expr.XPathContext;

import net.sf.saxon.functions.ExtensionFunctionCall;

import net.sf.saxon.functions.ExtensionFunctionDefinition;

import net.sf.saxon.om.*;

import net.sf.saxon.trans.XPathException;

import net.sf.saxon.value.SequenceType;

import net.sf.saxon.value.StringValue;



public class AbsolutizeUrl extends ExtensionFunctionDefinition {

    /**

     * The function will need a name you can call

     */

    private static final StructuredQName qName =

            new StructuredQName("", 

                    "http://vijedi.net/", 

                    "absolutizeUrl");



    @Override

    public StructuredQName getFunctionQName() {

        return qName; 

    }

}



The extension point extends ExtensionFunctionDefinition. I went ahead and created a constant that will store the name of the class and the function to return it. You will use this to access the function from inside of your XSLT. Now it’s time to think about the interface of this function. The function can take up to two string parameters, the absolute url base, and an optional url to process. The url to process is optional since it is not a requirement that an <a> will have an href attribute. The function will return either a string or null if the second parameter does not exist. This is how you define this interface in the code.

private final static SequenceType[] argumentTypes = new SequenceType[] {

        SequenceType.SINGLE_STRING,

        SequenceType.OPTIONAL_STRING

};



@Override

public int getMinimumNumberOfArguments() {

    return 1;

}



@Override

public int getMaximumNumberOfArguments() {

    return 2;

}



@Override

public SequenceType[] getArgumentTypes() {

    return argumentTypes;  

}



@Override

public SequenceType getResultType(SequenceType[] suppliedArgumentTypes) {

    return SequenceType.makeSequenceType(

            suppliedArgumentTypes[0].getPrimaryType(), StaticProperty.ALLOWS_ZERO_OR_ONE);

}

Once the interface is defined, it’s time to define the actual work. The actual call is handled by a class that extends ExtensionFunctionCall. I like to define these as inner classes of the ExtensionFunctionDefinition. The pattern for this class is pretty simple. You need to process the arguments to the function. Saxon will give you wrapped arguments that you will need to unwrap. Then you need to call the actual logic (which should be in a separate class for re-usability) and finally wrap and return the value. Just as crucially, you need to override the function that tells the Saxon parser to use your implementation of ExtensionFunctionCall for this definition.

    @Override

    public ExtensionFunctionCall makeCallExpression() {

        return new AbsolutizeUrlCall(); 

    }



    private static class AbsolutizeUrlCall extends ExtensionFunctionCall {



        @Override

        public SequenceIterator call(SequenceIterator[] arguments, XPathContext xPathContext) throws XPathException {



            StringValue pageUrlSV = (StringValue) arguments[0].next();

            if(null == pageUrlSV) {

                return EmptyIterator.getInstance();

            }



            StringValue hrefUrlSV = null;

            if(arguments.length > 1) {

                hrefUrlSV = (StringValue) arguments[1].next();

                if(null == hrefUrlSV) {

                    return EmptyIterator.getInstance();

                }

            }

            

            String pageUrl = pageUrlSV.getStringValue();

            String hrefUrl = hrefUrlSV.getStringValue();



            // Url transformation magic goes here



            Item item = new StringValue(fullUrl);

            return SingletonIterator.makeIterator(item);  

        }

    }

That completes the definition of the function. You can find the full example code on GitHub. Now that you’ve written an extension, you need to tell Saxon that this function exists. For this, you will need to add the following to wherever you are currently accessing the TransformerFactory.

private TransformerFactory getTransformerFactory() throws net.sf.saxon.trans.XPathException {

    TransformerFactory tFactory = TransformerFactory.newInstance();

    if(tFactory instanceof TransformerFactoryImpl) {

        TransformerFactoryImpl tFactoryImpl = (TransformerFactoryImpl) tFactory;

        net.sf.saxon.Configuration saxonConfig = tFactoryImpl.getConfiguration();

        saxonConfig.registerExtensionFunction(new AbsolutizeUrl());

    }

    return tFactory;

}

This code checks to see whether or not you’re using a Saxon processor, and if so, registers your new function within it. Finally, it’s time to update the stylesheet to use the new function. You’ll need to add the function to the namespace, using the same parameter found in the second argument of the StructuredQName constructor.

Now you can use it like any other function:

Tejus Parikh

I'm a software engineer that writes occasionally about building software, software culture, and tech adjacent hobbies. If you want to get in touch, send me an email at [my_first_name]@tejusparikh.com.