Posts Tagged ‘xml’
  • Use local resources when validating XML

    Depending of you enterprise security policy, some - if not most of your middleware servers have no access to Internet. It’s even worse when your development infrastructure is isolated from the Internet (such as banks or security companies). In this case, validating your XML against schemas becomes a real nightmare.

    Of course, you could set the XML schema location to a location on your hard drive. But about about your co-workers then? They would have to have the schema in exactly the same filesystem hierarchy, and that wouldn’t solve your problem about the production environment…

    XML catalogs

    XML catalogs are the nominal solution for your quandary. In effect, you plug in a resolver that knows how to map an online location to a location on your filesystem (or more precisely a location to another one). This resolver is set through the xml.catalog.files system property that may be different for each developer or in the production environment:

    <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        <system systemId="" uri="person.xsd" />
        <system systemId="" uri="order.xsd" />

    Now, when validation happens, instead of looking up the online schema URL location, it looks up the local schema.


    XML catalogs can be used with JAXP, either SAX, DOM or STaX. The following details the code how to use them with SAX.

    SAXParserFactory factory = SAXParserFactory.newInstance();
    XMLReader reader = factory.newSAXParser().getXMLReader();
    // Set XSD as the schema language
    reader.setProperty("", "");
    // Use XML catalogs
    reader.setEntityResolver(new CatalogResolver());
    InputStream stream = getClass().getClassLoader().getResourceAsStream("person.xml");
    reader.parse(new InputSource(stream));

    Two lines are important:

    • In line 13, we tell the reader to use XML catalogs
    • But line 7 is almost as important: we absolutely have to use the reader, not the SAX parser to parse the XML or validation will be done online!

    Use cases

    There are basically two use cases for XML catalogs:

    1. As seen above, the first use-case is to validate XML files against cached local schemas files
    2. In addition to that, XML catalogs can also be used as an alternative to LSResourceResolver (as seen previously in XML validation with imported/included schemas)

    Important note

    Beware: the CatalogResolver class is available in the JDK in an internal com.sun package, but the Apache XML resolver library does the job. In fact, the internal class is just a repackaging of the Apache one.

    If you use Maven, the dependency is the following:


    Beyond catalogs

    Catalog are a file-system based standard way to map an online schema location.

    However, this isn’t always the best strategy to choose from. For example, Spring provides its different schemas inside JARs, along classes and validation is done against those schemas.

    In order to do the same, replace line 13 in the code above, and instead of a CatalogResolver, use your own implementation of EntityResolver.

    You can find the source for this article here (note that associated tests use a custom security policy file to prevent network access, and if you want to test directly inside your IDE, you should reuse the system properties used inside the POM).

    Categories: Java Tags: xml
  • XML validation with imported/included schemas

    Recently, I tried to help a teammate design a WSDL file. I gently drove him toward separating the interface itself in the WSDL file and domain objects in a XML Schema file. One thing leading to another, I also made him split this XSD into two separate files, one including another for design purposes. Alas, tests were already present, and they failed miserably after my refactoring, complaining about a type in the included file not being found. The situation was extremely unpleasant, not only because I looked a little foolish in front of one of my co-worker, but also because despite my best efforts, I couldn’t achieve validation.

    I finally found the solution, and I hope to spread it as much as I can in order for other developers to stop loosing time regarding this issue. The root of the problem is that the Java XML validation API cannot resolved included XML schemas (and imported ones as well), period. However, it allows for registering a (crude) resolver that can provide the content of the included/imported XSD. So, the solution is to implement your own resolver and your own content holder (there’s none provided in the JDK 6).

    1. Create a “input” implementation. This class is responsible for holding the content of the resolved schema.
    public class LSInputImpl implements LSInput {
        private Reader characterStream;
        private InputStream byteStream;
        private String stringData;
        private String systemId;
        private String publicId;
        private String baseURI;
        private String encoding;
        private boolean certifiedText;
        // Getters and setters here
    1. Create a resolver implementation. This one is based on the premise that the included/imported schemas lies at the root of the classpath, and is relatively simple. More complex implementations can provide for a variety of locations (filesystem, internet, etc.).
    public class ClasspathResourceResolver implements LSResourceResolver {
        public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
            LSInputImpl input = new LSInputImpl();
            InputStream stream = getClass().getClassLoader().getResourceAsStream(systemId);
            input.setCharacterStream(new InputStreamReader(stream));
            return input;
    1. Finally, just set the resolver on the schema factory:
    SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA_NS_URI);
    schemaFactory.setResourceResolver(new ClasspathResourceResolver());

    These 3 steps will go a long way toward cleanly splitting your XML schemas.

    Categories: Java Tags: validationxml
  • Discover Spring authoring

    In this article, I will describe a useful but much underused feature of Spring, the definition of custom tags in the Spring beans definition files.

    Spring namespaces

    I will begin with a simple example taken from Spring’s documentation. Before version 2.0, only a single XML schema was available. So, in order to make a constant available as a bean, and thus inject it in other beans, you had to define the following:

    <bean id="java.sql.Connection.TRANSACTION_SERIALIZABLE"
      class="org.springframework.beans.factory.config.FieldRetrievingFactoryBean" />

    Spring made it possible, but realize it’s only a trick to expose the constant as a bean. However, since Spring 2.0, the framework let you use the util namespace so that the previous example becomes:

    <util:constant static-field="java.sql.Connection.TRANSACTION_SERIALIZABLE"/>

    In fact, there are many namespaces now available:

    Prefix Namespace Description
    bean Original bean schema
    util Utilities: constants, property paths and collections
    jee JNDI lookup
    lang Use of other languages
    tx Transactions
    aop AOP
    context ApplicationContext manipulation

    Each of these is meant to reduce verbosity and increase readibility like the first example showed.


    What is still unknown by many is that this feature is extensible that is Spring API provides you with the mean to write your own. In fact, many framework providers should take advantage of this and provide their own namespaces so that integrating thier product with Spring should be easier. Some already do: CXF with its many namespaces comes to mind but there should be others I don’t know of.

    Creating you own namespace is a 4 steps process: 2 steps about the XML validation, the other two for creating the bean itself. In order to illustrate the process, I will use a simple example: I will create a schema for EhCache, the Hibernate’s default caching engine.

    The underlying bean factory will be the existing EhCacheFactoryBean. As such, it won’t be as useful as a real feature but it will let us focus on the true authoring plumbing rather than EhCache implementation details.

    Creating the schema

    Creating the schema is about describing XML syntax and more importantly restrictions. I want my XML to look something like the following:

    <ehcache:cache id="myCache" eternal="true" cacheName="foo"
    maxElementsInMemory="5" maxElementsOnDisk="2" overflowToDisk="false"
    diskExpiryThreadIntervalSeconds="18" diskPersistent="true" timeToIdle="25" timeToLive="50"
      <ehcache:manager ref="someManagerRef" />

    Since I won’t presume to teach anyone about XML, here’s the schema. Just notice the namespace declaration:

    <xsd:schema xmlns:xsd=""
      targetNamespace="" xmlns=""
      <xsd:complexType name="cacheType">
        <xsd:sequence maxOccurs="1" minOccurs="0">
          <xsd:element name="manager" type="managerType" />
        <xsd:attribute name="id" type="xsd:string" />
        <xsd:attribute name="cacheName" type="xsd:string" />
        <xsd:attribute name="diskExpiryThreadIntervalSeconds" type="xsd:int" />
        <xsd:attribute name="diskPersistent" type="xsd:boolean" />
        <xsd:attribute name="eternal" type="xsd:boolean" />
        <xsd:attribute name="maxElementsInMemory" type="xsd:int" />
        <xsd:attribute name="maxElementsOnDisk" type="xsd:int" />
        <xsd:attribute name="overflowToDisk" type="xsd:boolean" />
        <xsd:attribute name="timeToLive" type="xsd:int" />
        <xsd:attribute name="timeToIdle" type="xsd:int" />
        <xsd:attribute name="memoryStoreEvictionPolicy" type="memoryStoreEvictionPolicyType" />
      <xsd:simpleType name="memoryStoreEvictionPolicyType">
        <xsd:restriction base="xsd:string">
          <xsd:enumeration value="LRU" />
          <xsd:enumeration value="LFU" />
          <xsd:enumeration value="FIFO" />
      <xsd:complexType name="managerType">
        <xsd:attribute name="ref" type="xsd:string" />
      <xsd:element name="cache" type="cacheType" />

    And for those, like me, that prefer a graphic display:

    Mapping the schema

    The schema creation is only the first part. Now, we have to make Spring aware of it. Create the file META-INF/spring.schemas and write in the following line is enough:


    Just take care to insert the backslash, otherwise it won’t work. It maps the schema declaration in the XML to the real file that will be used from the jar!

    Before going further, and for the more curious, just notice that in spring-beans.jar (v3.0), there’s such a file. Here is it’s content:


    It brings some remarks:

    • Spring eat their own dogfood (that’s nice to know)
    • I didn’t look into the code but I think that’s why XML validation of Spring’s bean files never complain about not finding the schema over the Internet (a real pain in production environment because of firewall security issues). That’s because the XSD are looked inside the jar
    • If you don’t specify the version of the Spring schema you use (2.0, 2.5, 3.0, etc.), Spring will automatically upgrade it for you with each major/minor version of the jar. If you want this behaviour, fine, if not, you’ll have to specify version

    Creating the parser

    The previous steps are only meant to validate the XML so that the eternal attribute will take a boolean value, for example. We still did not wire our namespace into Spring factory. This is the goal of this step.

    The first thing to do is create a class that implement org.springframework.beans.factory.xml.BeanDefinitionParser. Looking at its hierarchy, it seems that the org.springframework.beans.factory.xml.AbstractSimpleBeanDefinitionParser is a good entry point since:

    • the XML is not overly complex
    • there will be a single bean definition

    Here’s the code:

    public class EhCacheBeanDefinitionParser extends AbstractSimpleBeanDefinitionParser {
      private static final List&amp;amp;amp;amp;amp;lt;String&amp;amp;amp;amp;amp;gt; PROP_TAG_NAMES;
      static {
      PROP_TAG_NAMES = new ArrayList();
      protected Class getBeanClass(Element element) {
        return EhCacheFactoryBean.class;
      protected boolean shouldGenerateIdAsFallback() {
        return true;
      protected void doParse(Element element, ParserContext parserContext, BeanDefinitionBuilder builder) {
        for (String name : PROP_TAG_NAMES) {
          String value = element.getAttribute(name);
          if (StringUtils.hasText(value)) {
            builder.addPropertyValue(name, value);
        NodeList nodes = element.getElementsByTagNameNS("", "manager");
        if (nodes.getLength() > 0) {
        String msep = element.getAttribute("memoryStoreEvictionPolicy");
        if (StringUtils.hasText(msep)) {
          MemoryStoreEvictionPolicy policy = MemoryStoreEvictionPolicy.fromString(msep);
          builder.addPropertyValue("memoryStoreEvictionPolicy", policy);

    That deserves some explanations. The static block fills in what attributes are valid. The getBeanClass() method returns what class will be used, either directly as a bean or as a factory. The shouldGenerateIdAsFallback() method is used to tell Spring that when no id is supplied in the XML, it should generate one. That makes it possible to create pseudo-anonymous beans (no bean is really anonymous in the Spring factory).

    The real magic happen in the doParse() method: it just adds every simple property it finds in the builder. There are two interesting properties though: cacheManager and memoryStoreEvictionPolicy.

    The former, should it exist, is a reference on another bean. Therefore, it should be added to the builder not as a value but as a reference. Of course, the code doesn’t check if the developer declared the cache manager as an anonymous bean inside the ehcache but the schema validation took already care of that.

    The latter just uses the string value to get the real object behind and add it as a property to the builder. Likewise, since the value was enumerated on the schema, exceptions caused by a bad syntax cannot happen.

    Registering the parser

    The last step is to register the parser in Spring. First, you just have to create a class that extends org.springframework.beans.factory.xml.NamespaceHandlerSupport and register the handler under the XML tag name in its init() method:

    public class EhCacheNamespaceHandler extends NamespaceHandlerSupport {
      public void init() {
        registerBeanDefinitionParser("cache", new EhCacheBeanDefinitionParser());

    Should you have more parsers, just register them in the same method under each tag name.

    Second, just map the formerly created namespace to the newly created handler in a file META-INF/spring.handlers:


    Notice that you map the declared schema file to the real schema but the namespace to the handler.


    Now, when faced by overly verbose bean configuration, you have the option to use this nifty 4-steps techniques to simplify it. This technique is of course more oriented toward product providers but can be used by projects, provided the time taken to author a namespace is a real time gain over normal bean definitions.

    You will find the sources for this articlehere in Maven/Eclipse format.

    To go further:

    • Spring authoring: version 2.0 is enough since nothing changed much (at all?) with following versions
    • Spring's Javadoc relative to authoring
  • Customize your JAXB bindings

    JAXB is a bridge between the Java and the XML worlds, enabling your code to transparently marshalls and unmarshalls your Java objects to and from XML. In order to do this, you should have a class representing your XML-Schema. This class is created by the xjc. In most cases, xjc creates a class that won’t suit your needs. In this article, we’ll see why it does so and what we can do to customize this behaviour.


    This article suppose you use a Java Development Kit 6.0. Be wary that different updates of this same JDK 6 use different JAXB versions:

    Update JAXB version
    3 2.0.3
    4 2.1.3

    Concurrent technologies

    Before diving into JAXB, one should ask oneself whether using JAXB is necessary. There’s a plethora of frameworks whose goal is to serialize/deserialize Java objects to and from XML:

    • Historically speaking (and from my humble knowledge), Castor XML was the first widely used framework to manage Java XML serialization.
    • Since Java 1.4, two classes XMLEncoder and XMLDecoder are available. Their are respectively equivalent to ObjectOutputStream and ObjectInputStream, only they produce XML instead of bytes.
    • Finally, XStream is a 3rd party framework that is fast to run and easy to use.

    All these solutions are very good in reading/producing XML, yet they completely ignore the binding part. Binding is the creation of the Java class from the schema. Of course, JAXB has this feature.


    xjc is the executable used to create Java classes from XML schemas. Available syntax are DTD, XML-Schema, RELAX NG, RELAX NG Compact and WSDL. Since I now exclusively use Maven to build my projects, I won’t use xjc directly but I’ll configure the Maven POM to use it. The first thing to do is to add the repository to your POM (or your settings file, but I prefer the former):

            <name> Maven 2 Repository</name>
            <name> Maven 2 Repository</name>

    Now, you’re ready to use the plugin:


    Use the plugin with the following command line: mvn org.jvnet.jaxb2.maven2:maven-jaxb2-plugin:generate.

    Common configuration

    The previous command line will fail since we didn’t specify any configuration yet. The following should be assumed a good default:


    This tells xjc to look for all xsd files under the src/main/schema directory and generate the classes under src/main/generated. It will use the binding files (*.xjb) under the former directory.

    I sincerely advise you to set the removeOldOutput to false since xjc will erase all the directory content. If you use a source code management tool, this will include the directories used by your SCM (.cvs or .svn), a very bad idea.

    You need two things still. The first is to set the the compiler level to at least 1.5 in order to annotations to work:


    Maven only accepts a single source directory. The second thing to do is to make the  src/main/generated directory to Maven when building the project. It can be done with the help of the builder plugin:


    Now using the previous command line will produce something useful! Try it and you will have some nice surprise.


    In order to customize your bindings, you basically have 2 options:

    • polluting your XML Schema with foreign namespaces,
    • or keeping your XML Schema pure and putting all your bindings into a external file.

    From the terms I used, you can guess I prefer option 2. This is the right time to use a binding file. Create a file named binding.xjb under src/main/resources/schema.

    Package name

    Of  course, the package name is not something desirable: by default, it is the XML-Schema namespace minus http:// plus _package. The first customization is to change this behaviour. Put the following content into your binding file:

    <?xml version="1.0" encoding="UTF-8"?>
    <bindings xmlns=""
            <package name="" />

    If you have the following error, you should use update 14 from JDK 6:

    java.lang.LinkageError: JAXB 2.0 API is being loaded from the bootstrap classloader, but this RI (from jar:file:/C:/${user.home}/.m2/repository/com/sun/xml/bind/jaxb-impl/2.1.10/jaxb-impl-2.1.10.jar!/com/sun/xml/bind/v2/model/impl/ModelBuilder.class) needs 2.1 API. Use the endorsed directory mechanism to place jaxb-api.jar in the bootstrap classloader. (See

    If not, chances are your classes will have been generated under the suitable package name.


    Managing your entities will probably require you to have them serializable. In order to achieve this, you’ll have to modify your binding file to add the interface to your classes along with a serialVersionUID. Every class generated will now be serializable and have the specified uid: a limitation of this process is that each of your generated class will have the same uid.

    <?xml version="1.0" encoding="UTF-8"?>
    <bindings xmlns=""
            <package name="" />
            <serializable uid="100" />

    It is sometimes beneficial that all your bound classes inherit from a common class, one that is not described in the XSD. This may happen if this superclass is entirely for technical purpose (sucha as strong typing) and shouldn’t clutter the schema.

    In order to do so, two actions are necessary:

    • enable the extended mode for the xjc compiler. Just add true under your configuration in the POM
    • use the superClass tag in the namespace

    This class won’t be generated by xjc so you are free to create it as you please (making it abstract, adding methods and so on).

    <?xml version="1.0" encoding="UTF-8"?>
    <bindings xmlns=""
            <package name="" />
            <serializable uid="100" />
            <xjc:superClass name="" />

    Data type

    By default, xjc will bind the schema to the most precise data type available. For example, a XML-Schema date type will be bound to a Java javax.xml.datatype.XMLGregorianCalendar. This has the drawback of coupling your bound classes to the JAXB API and forcing you to use classes you don’t really need. For purpose of example, check how to convert a java.sql.Date from the database to your entity javax.xml.datatype.XMLGregorianCalendar: have fun!

    To ease your development, you can provide both during binding and marshalling/unmarshalling adapters meant to pass to and from XML. This is declared as such in the bindings file:

    <?xml version="1.0" encoding="UTF-8"?>
    <bindings xmlns=""
        <bindings schemaLocation="schema.xsd">
                <package name="" />
                <javaType name="java.util.Date" xmlType="xs:date"
                    printMethod="" />
                <serializable uid="100" />
                <xjc:superClass name="" />

    Notice you got another namespace to manage. This will generate an adapter class under the org.w3._2001.xmlschema package you’ll have to use during marshalling/unmarshalling process.

    Tweaking the output

    Not every information you’ll need in Java will be provided by the XML-Schema. What about compairing our objects with equals()?

    Not every schema you’ll use will have been designed by you or your team. Some will be legacy, some will be standards… That doesn’t mean you should suffer for other’s lack of good practice. What if the schema uses uppercase tags? Surely, you want your generated classes to follow Sun coding conventions.

    The Sun JAXB team provides plugins to the XJC compiler that resolve the above problems. Have a look at them.

    For our example, we choose to add hashCode(), equals() and toString() methods to our generated class. A plugin exists to do just that. You only have to configure this generation in the POM, adding the wanted arguments to the compiler and adding the plugins needed to manage these arguments:


    In this specific case, you’ll have to add two more dependencies (Commons Lang and JAXB Basics runtime) to your POM since the generated classes will depend on them:


    JAXB Persistence Bindings

    To go even further and have a single entity able to be marshalled by JAXB and managed by your persistence layer, you can use the HyperJaxb product.

    HyperJaxb2 cares about Hibernate managed persistence whereas HyperJaxb3 focuses on JPA managed persistence. The latter seems to be suffering from a lack of documentation, but the objective looks promising.


    When reading from and writing to XML, one may be provided with a XML Schema. In this case, it is a good practice to bind the schema to Java. JAXB is the standard solution to achieve this. Yet, in many cases, the generated Java classes will be lacking some features: this shouldn’t be considered the end of the world, since the binding process can be customized to take many parameters into account.

    You’ll find the sources for this article here.

    To go further: