Archive

Archive for February, 2010

Clustering Tomcat

February 24th, 2010 Nicolas Frankel No comments

In this article, I will show you how to use Apache/Tomcat in order to set up a load balancer. I know this has been done a zillion time before, but I will use this setup in my next article (teaser, teaser) so at least I will have it documented somewhere.

Apache Tomcat is the reference JSP/container since its inception. Despite a lack of full JEE support, it certainly has its appeal. The reasons behind using a full-featured commercial JEE application server are not always technical ones. With lightweight frameworks such as Spring being mainstream, it is not unusual to think using Tomcat in a production environment. Some companies did it even before that.

When thinking production, one usually think reliability and scalability. Luckily, both can be attained with Apache/Tomcat through the set up of a load-balancing cluster. Reliability is thus addressed so that if a Tomcat fails, following requests can be directed to a working Tomcat. Requests are dispatched to each Tomcat according to a predefined strategy. If the need be, more Tomcat can be added at will in order to scale.

In the following example, I will set up the simplest clustering topology possible: an Apache front-end that balances 2 Tomcat instance on the same physical machine.

Set up Apache

The first step is to configure Apache to forward your requests to Tomcat. There are basically 2 options in order to do this (I ruled out the pre-shipped load-balancer webapp):

  • use mod_jk, the classic Apache/Tomcat module
  • use mod_proxy, another Apache module

I’m not a system engineer, so I can’t decide on facts whether to use one or the other: I will use mod_jk since I’ve already used it before.

  • Download the mod_jk that is adapted to your Apache and Tomcat versions
  • Put it in the ‘modules’ folder of your Apache installation
  • Update your httpd.conf configuration to load it with Apache
    LoadModule jk_module modules/mod_jk-1.2.28-httpd-2.2.3.so
  • Configure Apache. Put these directive in the httpd.conf:
    JkWorkersFile	conf/worker.properties
    JkShmFile	logs/mod_jk.shm
    JkLogLevel	info
    JkLogFile logs/mod_jk.log
    JkMount		/servlets-examples/* lb

This configuration example is minimal but needs some comments:

Parameter Description
JkWorkersFile Where to look for the module configuration file (see below)
JkShmFile Where to put the shared memory file
JkLogLevel Module log level (debug/error/info)
JkLogFile Log file location. It is the default but declaring it avoid the Apache warning
JkMount Which url pattern will be forwarded to which worker

Since mod_jk can be used in non-clustered setups, there could be any JkMount, each forwarding to its own worker (see below). In our case, it means any request beginning with /servlets-examples/ (the trailing slash is needed) will be forwarded to the ‘lb’ worker .

Configure the workers

Workers are destination routes as viewed by Apache. They’re are referenced by an unique label in the httpd.conf and parameteirzed under the same label in the worker.properties file.
My workers.properties is the following:

worker.list=lb

worker.worker1.port=8010
worker.worker1.host=localhost
worker.worker1.type=ajp13

worker.worker2.port=8011
worker.worker2.host=localhost
worker.worker2.type=ajp13

worker.lb.type=lb
worker.lb.balance_workers=worker1,worker2

I define 3 workers in this file: lb, worker1 and worker2. The ‘lb’ worker is the load-balancing worker: it is virtual and it balances the latter two. Both are configured to point to a real Tomcat instance.

Now, with the Apache configuration in mind, we see that requests beginning with /servlets-examples/ will be managed by the load balancer worker which will in turn forward to a random worker.

Note: one can also put weight on workers hosted by more powerful machines so that these are more heavily loaded than less powerful ones. In our case, both are hosted on the same machine so it has no importance whatsoever.

Configure the Tomcat instances

The last step consist of the configuration of Tomcat instances. In order to do so, I shamelessly copied entire Tomcat installations (I’m on Windows). While editing the server.xml of the Tomcat instances, three points are worth mentioning:

  • The Engine tag has a jvmRoute attribute. It’s value should be the same as the worker’s name used in both httpd.conf and worker.properties. Otherwise, sessions will be recreated for each request
  • Look out for duplicated port numbers if all Tomcat instances are on the same machine. For example, use an incremental rule to configure every stream on a different port
  • Be sure that the tcpListenPort attribute of the Receiver is unique across all Tomcat instances

Use it!

With the previous set up, one can now start both Tomcat and Apache, then browse to the servlet-examples webapp, and more precisely to the Session page. Look there for Tomcat 5.5 and there for Tomcat 6. The servlet-example page page displays the associated session ID:

ID de Session: 324DAD12976045D197435033A67C025D.worker2
Crée le: Tue Feb 23 23:15:13 CET 2010
Dernier accès: Tue Feb 23 23:31:47 CET 2010

Notice that on my Tomcat instance, the worker’s name is part of the session ID.

If everything went fine, two interesting things should take place: first, when refreshing the page, the session ID should not change because of the sticky session (enabled by default). Morevoer, if I shutdown the Tomcat instance associated with the worker (the second in my case), and if I try to refresh the page, I still can access my application, but under a new session.

Thus, I lose all the information I stored under my session! In my following article, I will study how on can try to remedy to this.

To go further:

Categories: JEE Tags: ,

Free online SVN repositories

February 22nd, 2010 Nicolas Frankel 3 comments

This week, I searched for free online SVN repositories for closed-source projects. Of course, there are plenty of sites offering services for OpenSource projects: Google Code, SourceForge, etc. It may seem amazing, but everybody does not necessarily want to expose its code to the world.

From what I’ve found, there aren’t so many free SVN repositories around. Since it’s free and provided as a courtesy (not mentioning a thought toward upgrading to priced services later), there are some limits. Criteria are mainly:

  • The amount of disk space available
  • The number of projects you can store
  • The number of user accounts you can create in order to access the SVN

This is my results table and it definitely not exhaustive:

Name Mb Projects Users Notes
Unfuddle 200 1 2
ProjectLocker 500 Unlimited 5
Origo ? ? ? Origo is an OpenSource project in itself and provides both the product so you can host it yourself and online services. I couldn’t find any information on online limits though
XP-Dev 200 2 Unilimited I used it once but found to my dismay they erased my account because of months of inactivity
Beanstalk 100 ? 3

From a purely numerical point-of-view, and paying features aside, my pick will surely be ProjectLocker. This is not to say the others are bad but the free account has also some very interesting features:

  • Unlimited Bandwidth Transfer
  • SSL Encryption (very unusual for a free service)
  • Redundant RAID storage and Nightly Backups

I would definitely be interested in knowing what free Subversion provider do you use for your closed-source projects.

Disclaimer: I haven’t had any contact from any person working from ProjectLocker. I’m not an employee of ProjectLocker. The following only reflects my unbiased opinion.

Hibernate hard facts – Part 4

February 17th, 2010 Nicolas Frankel No comments

In the fourth article of this serie, I will show the subtle differences between get() and load() methods.

Hibernate, like life, can be full of suprises. Today, I will share one with you: have you ever noticed that Hibernate provides you with 2 methods to load a persistent entity from the database tier? These two methods are get(Class, Serializable) and load(Class, Serializable) of the Session class and their respective variations.

Strangely enough, they both have the same signature. Strangely enough, both of their API description starts the same:

Return the persistent instance of the given entity class with the given identifier.

Most developers use them indifferently. It is a mistake since, if the entity is not found, get() will return null when load() will throw an Hibernate exception. This is well described in the API:

Return the persistent instance of the given entity class with the given identifier, assuming that the instance exists. You should not use this method to determine if an instance exists (use get() instead). Use this only to retrieve an instance that you assume exists, where non-existence would be an actual error.

Truth be told, the real difference lies elsewhere: the get() method returns an instance, whereas the load() method returns a proxy. Not convinced? Try the following code snippet:

Session session = factory.getCurrentSession();

Owner owner = (Owner) session.get(Owner.class, 1);

// Test the class of the object
assertSame(owner.getClass(), Owner.class);

The test pass, asserting that the owner’s class is in fact Owner. Now, in another session, try the following:

Session session = factory.getCurrentSession();

Owner owner = (Owner) session.load(Owner.class, 1);

// Test the class of the object
assertNotSame(owner.getClass(), Owner.class);

The test will pass too, asserting that the owner’s class is not Owner. If you spy the object in the debugger, you’ll see a Javassist proxyed instance and that fields are not initialized! Notice that in both cases, you are able to safely cast the instance to Owner. Calling getters will also return expected results.

Why call the load() method then? Because since it is a proxy, it won’t hit the DB until a getter method is called.

Moreover, these features are also available in JPA from the EntityManager, respectively with the find() and getReference() methods.

Yet, both behaviours are modified by Hibernate’s caching mechanism. Try the following code snippet:

// Loads the reference
session.load(Owner.class, 1);

Owner owner = (Owner) session.get(Owner.class, 1);

According to what was said before, owner’s real class should be the real McCoy. Dead wrong! Since Hibernate previously called load(), the get() looks in the Session cache (the 1st level one) and returns a proxy!

The behaviour is symmetrical with the following test, which will pass although it’s counter-intuitive:

// Gets the object
session.get(Owner.class, 1);

// Loads the reference, but looks for it in the cache and loads
// the real entity instead
Owner owner = (Owner) session.load(Owner.class, 1);

// Test the class of the object
assertSame(owner.getClass(), Owner.class);

Conclusion: Hibernate does a wonderful job at making ORM easier. Yet, it’s not an easy framework: be very wary for subtle behaviour differences.

The sources for the entire hard facts serie is available here in Eclipse/Maven format.

Categories: Java Tags: , , , ,

Seamless installation: convention over configuration

February 15th, 2010 Nicolas Frankel No comments

Today, I will not take the role of the architect that knows how to deliver applications but instead I will play the end-user part.

In a previous post, I was tasked to put a whole development infrastructure in place. A continuous integration server was indeed in order. I took a look at some, but I was really dumbfounded when I tried Hudson. Features are not what stroke me at that time (although Hudson’s features did serve me well) but only the ease of installation.

Let’s look at a traditional installation. The steps are the following:

  • Download the installer
  • Launch the installer
  • Accept the security warning (I’m on Windows, guess Nix users would probably sudo before)
  • Follow the wizard numerous steps (which probably includes accepting a license)

In turn, launching the Hudson test drive is a two-click process, the only thing needed being a local JVM:

  • Click the Java Web Start link
  • Accept the security warning

Let’s not dive into the technical details on how it is done. I’m only interested with the results: with only two mouse clicks, Hudson launches its console and you can start working. From a user point of view, that’s real value! Now, I understand that such an installation is just for example purposes; yet, this is really nice to have a product ready to run in such a few steps.

Maven invented the convention over configuration so that build managers would not have to write the same tasks over and over for each of their projects. Learning its lessons from EJB2, Sun took the same path for EJB3: developers now really have less code to write. Build managers and developers are end-users in these processes. As the product end-user, I would really like to install it from some common sense default configuration. If needed, I should be able to overload this convention (Hudson does not provide this overloading because the goal is to test the product quickly).

As an architect, I think the installation domain area is pretty uncharted. We are much focused on clean code, maintenability, design and such. Some of us even sometimes explore the interface and ergonomy of the product. All of these are fine and needed but  not enough IMHO. Think of the installation process too and of Hudson’s example so that we, as end-users, can benefit from seamless installation.

Securing middleware products

February 9th, 2010 Nicolas Frankel No comments

My work is IT architecture, meaning I focus on the early steps of a project. Once the application is in production, I usually leave it to systems and production engineers. For example, for JVM fine tuning, most of the clients I worked for have people that have the right skills to do that.

Nevertheless, I need sometimes to sully my nails. This happens in two cases: when the client is too small to have such dedicated teams or when its production team are not experienced enough to handle the problem at hand. Believe it or not, it happened to me that I had to show WebSphere administrators how to connect JAAC connectors to a LDAP server.

Anyway, I always value information on how to handle cases out of my usual scope: first, it never hurts to know more. Second, it is sometimes handy to sort what production teams tell you: some is real stuff, some is bulls. Likewise, I invite production teams to learn about development so that they may sort what is told to them too. Learning the other’s craft let you increase comprehension between different teams.

Free checklist audits

This week, I learned about a site that propose free benchmarks to audit your infrastructure’s security. This site is the Center for Internet Security. Proposed benchmarks are two-fold: part document about what is audited, part benchmarking tool. The former is freely downloadable; as for the second part, you must register. The rest of this article will focus on the document.

Though many subjects will always be beyond my reach (I will never accept to secure an Oracle Database), one document is of utmost interest to me: the benchmark on Apache  Tomcat.

This file include rules that, once you comply with them, will make your product more secure. Even if most of them are no-nonsense and you could think about it yourself, the document make a nice check-list. Some rules are really interesting in that I am afraid they are seldom enforced, some because of neglect, some because of lack of knowledge of the product.

Enhancements

Checklists provided by the CIS do lack some things though:

  • risk correlated to statistics. Some security holes aren’t used by many hackers. How should I prioritize?
  • risk correlated to damage. What’s the potential damage of not underdoing this action? For example, session hijacking will compromize users interactions with my application, not my server
  • trade-off. Many security features are not always desirable, and most have a trade-off, often in terms of performance. When I browse a merchant site, crypting my communications is overkill. Only during the payment phase is a real need to keep information secret.

Rules examples

For Tomcat, here’s is a sample of the audited rules.

Separate Web content directory from Tomcat’s system files

Tomcat comes with its own file structure, including a webapp directory where webapps should reside. Yet, nothing prevents webapps to be outside this directory, even on another partition. From a security point-of-view, this will avoid directory traversal exploits: if a malicious user gains access to the webapps directory, he will not have access to the server.

Moreover, from a maintenance point-of-view, you are able to upgrade Tomcat without redeploying your applications.

Disable session façade recycling

Tomcat’s model is to use façade on every entity of the HTTP model: request, response, session, etc. By default, Tomcat’s façades over sessions are reused when processing new requests in order to optimize memory use. Thus, this could lead a new request to have access to informations on sessions that are not tied to it. This is a security risk and should be turned off if one’s want to secure the server.

Disable auto-deployment

Tomcat’s default behaviour is to have a running thread that watches the webapps directory. Once a new war is detected by this thread, it deploys it automatically. Such action is very enjoyable in a development environment. In a production environment, users that have access to the directory could potentially put malicious webapps in it and have it deployed automatically. Thus, disabling auto-deployment increases the security of the Tomcat’ server.

Conclusion

Checklists provided by the CIS are very nice to have for production and security engineers. However, one should carefully evaluate the cost of enforcing the rule agains the risk of not enforcing it. Those are either not detailed enough in the documentation, or not provided at all.

To go further:

Categories: Technical Tags: ,

Maven The complete reference

February 3rd, 2010 Nicolas Frankel 2 comments

This review is about Sonatype’s Maven: The complete reference by Tim O’Brien, John Casey, Brian Fox, Jason Van Zyl, Eric Redmond and Larry Shatzer.

Disclaimer: I learned Maven from Sonatype’s site 3 years ago. I found it was a great tool to learn Maven. Now that I have a little more experience in the tool, I tried to write this review in an objective manner.

Facts

  1. 13 chapters, 267 pages, free (see below)
  2. This book is intended for both readers who wants to learn Maven from scratch and for readers who need to look for a quick help on an obscure feature
  3. A whole chapter is dedicated to the Maven assembly plugin
  4. Another chapter is dedicated to Flexmojos, a Sonatype plugin to manage Flex projects

Pros

  1. First of all, this book is 100% free to view and to download. This is rare enough to be state!
  2. Complete reference books are sometimes a mere paraphrase of a product’s documentation. This one is not. I do not claim I’m a Maven expert but I did learn things in here
  3. This book is up-to-date with Maven 2.2. For example, it explains password encryption (available since Maven 2.1.0) or how to configure plugins called from the command line differently using default-cli (since Maven 2.2.0)
  4. A very interesting point is a list of some (all?) JEE API released by the Geronimo project and referenced by groupId and artifactId. If you frown because the point is lost on you, just try using classes from activation.jar (javax.activation:activation): you’ll never be able to let Maven download it for you since it is not available in the first place for licensing reasons. Having an alternative from Geronimo is good, knowing what is available thanks to the book is better

Cons

To be frank, I only found a problem with Maven: The complete reference. Although a whole chapter is written on the Maven Assembly plugin, I understood nothing from it… The rest of the book is crystal clear, this chapter only obfuscated the few things I thought I knew about the plugin.

Conclusion

This book is top quality and free: what can I say? If you’re a beginner in Maven, you’ll find a real stable base to learn from. If you need to update your knowledge, you will find a wealth of information. If you’re a Maven guru, please contribute to the Assembly plugin’s chapter. I can only give a warm thank you for Sonatype’s effort for giving this quality book to the community.

Categories: Book review Tags: ,

Context root tweaking

February 1st, 2010 Nicolas Frankel No comments

JEE never ceases to amaze me. Even when I think I’m on top and I know all there’s to know about webapps, I’m in for a surprise. Good news is, whatever you think you know about a subject, there’s still room for one more fact. Bad news is, I’m deeply disturbed by what I learned.

Fact is: web applications context root can contain the / characte, meaning an URL such as http://localhost/multipart/context can refer to the root of the multipart/context webapp as well as to the context servlet mapping of the http://localhost/multipart webapp. When I was told that, my first reaction was disbelief. I immediately hurried to run a few tests on the JOnAS and JBoss application servers and it confirmed that was entirely possible.

In fact, if you look at the application.xml XML Schema, you see that the context-root is of a type that extends string (with an id) :

This means the context root can effectively contains anything, including slashes, backslashes and what have you.

The J2EE 1.4 specifications does not enforce additional constraints (p 125) :

Each web module must be given a distinct and non-overlapping name for its context root. [...] See the servlet specification for detailed requirements of context root naming.

I did not see additional requirements in the Servlet 1.4 specifications for the context root.

Tomcat even takes this tweak into account when creating Context with individual XML files. It calls it “multi-level context paths” :

In individual files (with a “.xml” extension) in the $CATALINA_HOME/conf/[enginename]/[hostname]/ directory. The name of the file (less the .xml) extension will be used as the context path. Multi-level context paths may be defined using #, e.g. foo#bar.xml for a context path of /foo/bar.

So, in order to deploy the context.war under the multipart/context root, you’ll have to name the context XML file multipart#context.xml.

The content of the file is the following:

<?xml version='1.0' encoding='utf-8'?>
<Context docBase="${catalina.home}/context.war" />
<!-- this means the context.war should be available at the root of Tomcat -->

In the tests I’ve made, if the multi-level context path shadows a classic context-root with a servlet mapping, the former takes precedence. I do not advise using this since the case is not specified in the specification, it could be handled differently from product to product.

OK, now we’ve established these facts, what is the point in knowing this, aside from setting you as the übergeek in a JEE geek convention? Since http://localhost/multipart/context and http://localhost/multipart are clearly separated web applications, they do not even share context.

In my case, that was the solution for a simple use-case. Imagine monitoring web applications: you know the URL you have to monitor. Now a product, developped in-house for diagnostics purpose, is deployed side-by-side with each webapp in webapp form. It would be very nice if you could reach the diagnostics webapp without looking at the documentation to know its URL. Let’s say it is the business webapp’s URL appended with /diag.

So, if the main webapp’s URL is http://myserver.com/mywebapp and the person in charge of monitoring knows about it, he knows he has to access http://myserver.com/mywebapp/diag and he gets what he wants! On the development side, it means both webapps are different products, are developed by different teams and have different lifecycles.

Categories: JEE Tags: , , ,