Home > Java > XML validation with imported/included schemas

XML validation with imported/included schemas

Recently, I tried to help a teammate design a WSDL file. I gently drove him toward separating the interface itself in the WSDL file and domain objects in a XML Schema file. One thing leading to another, I also made him split this XSD into two separate files, one including another for design purposes. Alas, tests were already present, and they failed miserably after my refactoring, complaining about a type in the included file not being found. The situation was extremely unpleasant, not only because I looked a little foolish in front of one of my co-worker, but also because despite my best efforts, I couldn’t achieve validation.

I finally found the solution, and I hope to spread it as much as I can in order for other developers to stop loosing time regarding this issue. The root of the problem is that the Java XML validation API cannot resolved included XML schemas (and imported ones as well), period. However, it allows for registering a (crude) resolver that can provide the content of the included/imported XSD. So, the solution is to implement your own resolver and your own content holder (there’s none provided in the JDK 6).

  1. Create a “input” implementation. This class is responsible for holding the content of the resolved schema.
    public class LSInputImpl implements LSInput {
    
        private Reader characterStream;
        private InputStream byteStream;
        private String stringData;
        private String systemId;
        private String publicId;
        private String baseURI;
        private String encoding;
        private boolean certifiedText;
    
        // Getters and setters here
    }
  2. Create a resolver implementation. This one is based on the premise that the included/imported schemas lies at the root of the classpath, and is relatively simple. More complex implementations can provide for a variety of locations (filesystem, internet, etc.).
    public class ClasspathResourceResolver implements LSResourceResolver {
    
        @Override
        public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
    
            LSInputImpl input = new LSInputImpl();
    
            InputStream stream = getClass().getClassLoader().getResourceAsStream(systemId);
    
            input.setPublicId(publicId);
            input.setSystemId(systemId);
            input.setBaseURI(baseURI);
            input.setCharacterStream(new InputStreamReader(stream));
    
            return input;
        }
    }
  3. Finally, just set the resolver on the schema factory:
    SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA_NS_URI);
    
    schemaFactory.setResourceResolver(new ClasspathResourceResolver());

These 3 steps will go a long way toward cleanly splitting your XML schemas.

email
Send to Kindle
Categories: Java Tags: ,
  1. Ponchel
    September 3rd, 2012 at 13:23 | #1

    Hi Nicolas,

    “The root of the problem is that the Java XML validation API cannot resolved included XML schemas (and imported ones as well), period.”
    Well well well, not that quick ! Java XML Validation API resolves at least imported XML Schema from file system, using relative file path specified in import’s schemaLocation attribute, I did work on such a project recently. I did not test includes, only imports.

    I finally used your approach, as my XML Schemas were packaged into a jar in the classpath, and indeed, validation API doesn’t resolve path across libraries in classpath.

    One thing you should say is that your code is able to find an XML Schema in the classpath only if the designer of the XML Schema did specify an absolute classpath in the schemaLocation attribute of his import tags. My XML Schema import’s schemaLocation attribute were expressed relatively to a root XML Schema, so I had to convert relative classpath reference to absolute classpath (java.net.URI.resolve juste do that well)

  2. Chris L.
    July 30th, 2013 at 15:37 | #2

    Please consider expanding this example with an XML catalog, an essential way of coping with included schemas without pulling files over the network. Not all schemas are hosted of course, and sometimes computers don’t have good network connections!

    I’ve had good success with reusing this class that’s in JDK, altho I am sure because its “internal” that I should not be going any where near it.

    com.sun.org.apache.xerces.internal.util.XMLCatalogResolver

    Thanks for listening.

  1. No trackbacks yet.