Archive

Author Archive

Managing web sessions

March 8th, 2010 Nicolas Frankel No comments

In the previous article, I set up a cluster of 2 Tomcat instances in order to achieve load-balacing. It also offered failover capability. However, when using this feature, the user session was lost when changing node. In this article, I will show you how this side-effect can be avoided.

Reminder: the HTTP protocol is inherently disconnected (as opposed to FTP which is connected). In HTTP, the client sends a request to a server, it gets its response and that’s the end. The server cannot natively associate a request to a previous request made by the same client. In order to use HTTP for applications purpose, we needed to group such requests. Session is a label for this grouping feature.

This is done through a token, passed to the client on the first request. The token is passed with a cookie if possible, or appended to the URL if not. Interestingly enough, this is the same for PHP (token named PHPSESSID) as for JEE (token named JSESSIONID). In both case, we use a stateless protocol and tweak it so that it appears stateful. This pseudo-statefulness make possible application-level features, such as authentication/authorization or shopping cart.

Now, let’s take a real use case. I’m browsing through an online shop. I have already put some article in my cart. When I decide to finally go to the payment, I find myself with an empty basket! What happened? Unbeknownst to me, the node which hosted my session crashed and I was transparently rerouted to a working node, without my session ID, and thus, without the content of my cart.

Such thing can not happen in real life, since such a shop will probably be put out of business if this happens too often. There are basically 3 strategies to adopt in order to avoid such loss.

Sessions are evil

From time to time, I stumble upon articles blaming sessions and labeling them as evil. While not an universal truth, using sessions in a bad way can have negative side-effects.

The most representative bad usage of session if putting everything in them. I’ve seen lazy developers put collections in session in order to manage paging. You pass a statement once, put the result in the session, and manage paging on the sessionized result. Since collection’s size is not constrained, such use does not scale well with the number of users increasing. In most cases, everything goes fine in development but you’re soon overwhelmed by unusual response time or even OutOfMemoryError in production.

If you think sessions are evil, some solutions are:

  • Store data in the database: now your data has to go from the front-end to the back-end to be saved, and then back again to be used. Classic relational database may not be the solution you’re looking for. Most high-traffic low-response time take the NoSQL route, although I don’t know if they use it for session storage purpose
  • Store data on the client side through cookies: your clients needs to have cookie-enabled browser. Furthermore, data will be sent with every request/response so don’t overuse too much

The second option has the advantage of freeing you of session ID management. However, for both solution, your application code needs to implement the storage part.

Besides, nothing stops you from using sessions and using the extension points of your application server to use cookie storage instead of the default behaviour (mostly in memory). I wouldn’t recommend that though.

Server session replication

Another solution is to embrace session – this is a JEE feature after all – but to use session replication in order to avoid session data loss. Session replication is not a JEE feature. It is a proprietary feature, offered by many (if not all) application servers that is entirely independent from your code: your code uses session, and it is magically replicated by the server across cluster nodes.

There are two constraints common to session replication amongst all servers:

  • Use the <distributable /> tag in the web.xml
  • Only put in session instances of classes that are java.lang.Serializable

IMHO, these rules should be enforced on all web applications, whether currently deployed on a cluster or not, since they are not very restrictive. This way, deploying an application on a cluster will tends toward a no-operation.

Strategies available for session replication are application server dependent. However, they are usually based on the following implementations:

  • In memory replication: each server stores all servers session datat. When updating, it broadcasts to all nodes the modified session (or the delta, based on the strategy available/used). This implementation heavily uses the network and the memory
  • Database persistence
  • File persistence: the file system used should be available to all cluster nodes

For our simple example, I will just show you how to use in-memory session replication in Tomcat. The following are the steps one should take in order to do so. Please notice that one should first undertake what is described in the Tomcat clustering article.

Note: Tomcat 5.5 has the cluster configuration commented in server.xml. Tomcat 6 does not. Here is the default clustering configuration:

<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
  managerClassName="org.apache.catalina.cluster.session.DeltaManager"
  expireSessionsOnShutdown="false"
  useDirtyFlag="true"
  notifyListenersOnReplication="true">

  <Membership className="org.apache.catalina.cluster.mcast.McastService"
    mcastAddr="228.0.0.4"
    mcastPort="45564"
    mcastFrequency="500"
    mcastDropTime="3000"/>

  <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"
    tcpListenAddress="auto"
    tcpListenPort="4001"
    tcpSelectorTimeout="100"
    tcpThreadCount="6"/>

  <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
    replicationMode="pooled"
    ackTimeout="15000"
    waitForAck="true"/>

  <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
    filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>

  <Deployer className="org.apache.catalina.cluster.deploy.FarmWarDeployer"
    tempDir="/tmp/war-temp/"
    deployDir="/tmp/war-deploy/"
    watchDir="/tmp/war-listen/"
    watchEnabled="false"/>

  <ClusterListener className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
</Cluster>

First, make sure that the mcastAddrand mcastPort of the <Membership> tag are the same. This is validated when starting a second node with the following log in the first:

INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp
://10.138.97.43:4001,catalina,10.138.97.43,4001, alive=0]

This insures that all nodes of a cluster are able to communicate with each other. From this point on, considering all other configuration is left by default, sessions are replicated in memory in all nodes of the cluster. Thus, you don’t need sticky session anymore. This is not enough for failover though, since removing a node from the cluster will still lead a new request to be assigned a new session ID, thus preventing access to your previous session data.

In order to also route session IDs, you need to specify two additional tags in <Cluster>:

<Valve className="org.apache.catalina.cluster.session.JvmRouteBinderValve" enabled="true" sessionIdAttribute="takeoverSessionid"/>
<ClusterListener className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener" />

The valve redirects requests to another node with the previous session ID. The cluster listener receives session ID cluster change event. Now, removing a cluster node is seamless (apart from latency for redirected) for clients whose session ID was redirected to this node.

Last minute note: the previous setup uses the standard session manager. I was recently made aware of a third-party manager that also handles session cookies when a node fails, thus reducing the configuration hassle. Such product is Memcached Session Manager and is based on Memcached. Any feedback on the use of this product is welcome.

Third party session replication

The previous solution has the disadvantage of specific server configuration. Though it does not impact development, it needs to be done for every server type in a different manner. This could be a burden if you happen to have different server types in your enterprise.

Using third party products is a remedy to this. Terracotta is such a product: morevoer, by providing a set number of Terracotta nodes, you avoid broadcasting your session changes to all server nodes, like in the Tomcat replication previous example.

In the following, the server is the Terracotta replication server and the clients are the Tomcat instances. In order to set up Terracotta, two steps are mandatory:

  • create the configuration for the server. In order to be used, name it tc-config.xml and put it in the bin directory. In our case, this is it:
    <tc:tc-config xmlns:tc="http://www.terracotta.org/config"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-5.xsd">
    ...
        <modules>
          <module name="tim-tomcat-5.5"/>
        </modules>
    ...
          <web-applications>
    	    <!-- The webapp(s) to watch -->
            <web-application>servlets-examples</web-application>
    
          </web-applications>
    ...
    </tc:tc-config>

    Note: the default installed module is for Tomcat 6. In case you need the Tomcat 5.5 module, you have to launch the Terracotta Integration module Management (TIM) and use it to download the correct module. For users behind an Internet proxy, this is made possible by updating tim-get.properties with the following lines:
    org.terracotta.modules.tool.proxyUrl = …
    org.terracotta.modules.tool.proxyAuth = …

  • add to all client launch scripts the following lines:
    TC_INSTALL_DIR=
    
    TC_CONFIG_PATH=:
    
    . ${TC_INSTALL_DIR}/bin/dso-env.sh -q
    
    export JAVA_OPTS="$JAVA_OPTS $TC_JAVA_OPTS"

Now we’re back to session replication and failover but the configuration is usable across different application servers.

In order to see what is stored, you can also launch the Terracotta Developer Console.

Conclusion

There are 3 basic strategies to manage session failover over cluster nodes. The first one is not to use session at all: it has major consequences on your development time, since your application has to do it directly. The second one is to look at the server documentation to look how it is done with a specific server. This ties your session management to a single product. Last but not least, you can use a third-party product. This has the advantage to move the configuration outside the scope of your specific server, thus letting you move with less hassle from one server to the next and still enjoy the benefits of session failover. Were I a system engineer, this is the solution I would recommend since it is the most flexible.

To go further:

Categories: Development Tags: , ,

Spring Persistence with Hibernate

March 1st, 2010 Nicolas Frankel No comments

This review is about Spring Persistence with Hibernate by Ahmad Reza Seddighi from Packt Publishing.

Facts

  1. 15 chapters, 441 pages, 38€99
  2. This book is intended for beginners but more experienced developers can learn a thing or two
  3. This book covers Hibernate and Spring in relation to persistence

Pros

  1. The scope of this book is what makes it very interesting. Many books talk about Hibernate and many talk about Spring. Yet, I do not know of many which talk about the use of both in relation to persistence. Explaining Hibernate without describing the transactional side is pointless
  2. The book is well detailed, taking you by the hand from the bottom to reach a good level of knowledge on the subject
  3. It explains plain AOP, then Spring proxies before heading to the transactional stuff

Cons

  1. The book is about Hibernate but I would have liked to see a more tight integration with JPA. It is only described as an another way to configure the mappings
  2. Nowadays, I think Hibernate XML configuration is becoming obsolete. The book views XML as the main way of configuration, annotations being secondary
  3. Some subjects are not documented: for some, that’s not too important (like Hibernate custom SQL operations), for others, that’s a real loss (like the @Transactional Spring annotation)

Conclusion

Despite some minor flaws, Spring Persistence with Hibernate let you go head first into the very complex sujbect of Hibernate. I think that Hibernate has a very low entry ticket, and you can be more productive with it very quickly. On the downside, mistakes will cost you much more than with old plain JDBC. This book serves you Hibernate and Spring concepts on a platter, so you will make less mistakes.

Categories: Book review Tags: , ,

Hibernate hard facts – Part 5

March 1st, 2010 Nicolas Frankel No comments

In the fifth article of this serie, I will show you how to manage logical DELETE in Hibernate.

Most of the time, requirements are not concerned about deletion management. In those cases, common sense and disk space plead for physical deletion of database records. This is done through the DELETE keyword in SQL. In turn, Hibernate uses it when calling the Session.delete() method on entities.

Sometimes, though, for audit or legal purposes, requirements enforce logical deletion. Let’s take a products catalog as an example. Products regularly go in and out of the catalog. New orders shouldn’t be placed on outdated products. Yet, you can’t physically remove product records from the database since they could have been used on previous orders.

Some strategies are available in order to implement this. Since I’m not a DBA, I know of only two. From the database side, both add a column, which represents the deletion status of the record:

  • either a boolean column which represent either active or deleted status
  • or, for more detailed information, a timestamp column which states when the record was deleted; a NULL meaning the record is not deleted and thus active

Managing logical deletion is a two steps process: you have to manage both selection so that only active records are returned and deletion so that the status marker column is updated the right way.

Selection

A naive use of Hibernate would map this column to a class attribute. Then, selecting active records would mean a WHERE clause on the column value and deleting a record would mean setting the attribute and calling the update() method. This approach has the merit of working. Yet, it fundamentally couples your code to your implementation. In the case you migrate your status column from boolean to timestamp, you’ll have to update your code everywhere it is used.

The first thing you have to do to mitigate the effects of such a migration is to use a filter.

Hibernate3 provides an innovative new approach to handling data with “visibility” rules. A Hibernate filter is a global, named, parameterized filter that can be enabled or disabled for a particular Hibernate session.

Such filters can the be used throughout your code. Since the filtering criteria is thus coded in a single place, updating the database schema has only little incidence on your code. Back to the product example, this is done like this:

@Entity
@FilterDef(name = "activeProducts")
@Filter(name = "activeProducts", condition = "DELETION_DATE IS NULL")
public class Product {

  @Id
  @Column(nullable = false)
  @GeneratedValue(strategy = AUTO)
  private Integer id;

...
}

Note: in the attached source, I also map the DELETION_DATE on an attribute. This is not needed in most cases. In mine, however, it permits me to auto-create the schema with Hibernate.

Now, the following code will filter out logically deleted records:

session.enableFilter("activeProducts");

In order to remove the filter, either use Session.disableFilter() or use a new Session object (remember that factory.getCurrentSession() will probably use the same, so factory.openSession() is in order).

Deletion

The previous step made us factorize the “select active-only records” feature. Logically deleting a product is still coupled to your implementation. Hibernate let us decouple further: you can overload any CRUD operations on entities! Thus, deletion can be overloaded to use an update of the right column. Just add the following snippet to your entity:

@SQLDelete(sql = "UPDATE PRODUCT SET DELETION_DATE=CURRENT_DATE WHERE ID=?")

Now, calling session.delete() on an Product entity will produce the updating of the record and the following output in the log:

UPDATE PRODUCT SET DELETION_DATE=CURRENT_DATE WHERE ID=?

With CRUD overloading, you can even suppress the ability to select inactive records altogether. I wouldn’t recommend this approach however, since you wouldn’t be able to select inactive records then. IMHO, it’s better to stick to filters since they can be enabled/disabled when needed.

Conclusion

Hibernate let you loosen the coupling between your code and the database so that you can migrate from physical deletion to logical deletion with very localized changes. In order to do this, Hibernate offers two complementary features: filters and CRUD overloading. These features should be part of any architect’s bag of tricks since they can be lifesavers, like in previous cases.

You can find the sources of this article here in Maven/Eclipse format.

To go further:

Categories: Java Tags:

Clustering Tomcat

February 24th, 2010 Nicolas Frankel No comments

In this article, I will show you how to use Apache/Tomcat in order to set up a load balancer. I know this has been done a zillion time before, but I will use this setup in my next article (teaser, teaser) so at least I will have it documented somewhere.

Apache Tomcat is the reference JSP/container since its inception. Despite a lack of full JEE support, it certainly has its appeal. The reasons behind using a full-featured commercial JEE application server are not always technical ones. With lightweight frameworks such as Spring being mainstream, it is not unusual to think using Tomcat in a production environment. Some companies did it even before that.

When thinking production, one usually think reliability and scalability. Luckily, both can be attained with Apache/Tomcat through the set up of a load-balancing cluster. Reliability is thus addressed so that if a Tomcat fails, following requests can be directed to a working Tomcat. Requests are dispatched to each Tomcat according to a predefined strategy. If the need be, more Tomcat can be added at will in order to scale.

In the following example, I will set up the simplest clustering topology possible: an Apache front-end that balances 2 Tomcat instance on the same physical machine.

Set up Apache

The first step is to configure Apache to forward your requests to Tomcat. There are basically 2 options in order to do this (I ruled out the pre-shipped load-balancer webapp):

  • use mod_jk, the classic Apache/Tomcat module
  • use mod_proxy, another Apache module

I’m not a system engineer, so I can’t decide on facts whether to use one or the other: I will use mod_jk since I’ve already used it before.

  • Download the mod_jk that is adapted to your Apache and Tomcat versions
  • Put it in the ‘modules’ folder of your Apache installation
  • Update your httpd.conf configuration to load it with Apache
    LoadModule jk_module modules/mod_jk-1.2.28-httpd-2.2.3.so
  • Configure Apache. Put these directive in the httpd.conf:
    JkWorkersFile	conf/worker.properties
    JkShmFile	logs/mod_jk.shm
    JkLogLevel	info
    JkLogFile logs/mod_jk.log
    JkMount		/servlets-examples/* lb

This configuration example is minimal but needs some comments:

Parameter Description
JkWorkersFile Where to look for the module configuration file (see below)
JkShmFile Where to put the shared memory file
JkLogLevel Module log level (debug/error/info)
JkLogFile Log file location. It is the default but declaring it avoid the Apache warning
JkMount Which url pattern will be forwarded to which worker

Since mod_jk can be used in non-clustered setups, there could be any JkMount, each forwarding to its own worker (see below). In our case, it means any request beginning with /servlets-examples/ (the trailing slash is needed) will be forwarded to the ‘lb’ worker .

Configure the workers

Workers are destination routes as viewed by Apache. They’re are referenced by an unique label in the httpd.conf and parameteirzed under the same label in the worker.properties file.
My workers.properties is the following:

worker.list=lb

worker.worker1.port=8010
worker.worker1.host=localhost
worker.worker1.type=ajp13

worker.worker2.port=8011
worker.worker2.host=localhost
worker.worker2.type=ajp13

worker.lb.type=lb
worker.lb.balance_workers=worker1,worker2

I define 3 workers in this file: lb, worker1 and worker2. The ‘lb’ worker is the load-balancing worker: it is virtual and it balances the latter two. Both are configured to point to a real Tomcat instance.

Now, with the Apache configuration in mind, we see that requests beginning with /servlets-examples/ will be managed by the load balancer worker which will in turn forward to a random worker.

Note: one can also put weight on workers hosted by more powerful machines so that these are more heavily loaded than less powerful ones. In our case, both are hosted on the same machine so it has no importance whatsoever.

Configure the Tomcat instances

The last step consist of the configuration of Tomcat instances. In order to do so, I shamelessly copied entire Tomcat installations (I’m on Windows). While editing the server.xml of the Tomcat instances, three points are worth mentioning:

  • The Engine tag has a jvmRoute attribute. It’s value should be the same as the worker’s name used in both httpd.conf and worker.properties. Otherwise, sessions will be recreated for each request
  • Look out for duplicated port numbers if all Tomcat instances are on the same machine. For example, use an incremental rule to configure every stream on a different port
  • Be sure that the tcpListenPort attribute of the Receiver is unique across all Tomcat instances

Use it!

With the previous set up, one can now start both Tomcat and Apache, then browse to the servlet-examples webapp, and more precisely to the Session page. Look there for Tomcat 5.5 and there for Tomcat 6. The servlet-example page page displays the associated session ID:

ID de Session: 324DAD12976045D197435033A67C025D.worker2
Crée le: Tue Feb 23 23:15:13 CET 2010
Dernier accès: Tue Feb 23 23:31:47 CET 2010

Notice that on my Tomcat instance, the worker’s name is part of the session ID.

If everything went fine, two interesting things should take place: first, when refreshing the page, the session ID should not change because of the sticky session (enabled by default). Morevoer, if I shutdown the Tomcat instance associated with the worker (the second in my case), and if I try to refresh the page, I still can access my application, but under a new session.

Thus, I lose all the information I stored under my session! In my following article, I will study how on can try to remedy to this.

To go further:

Categories: JEE Tags: ,

Free online SVN repositories

February 22nd, 2010 Nicolas Frankel 3 comments

This week, I searched for free online SVN repositories for closed-source projects. Of course, there are plenty of sites offering services for OpenSource projects: Google Code, SourceForge, etc. It may seem amazing, but everybody does not necessarily want to expose its code to the world.

From what I’ve found, there aren’t so many free SVN repositories around. Since it’s free and provided as a courtesy (not mentioning a thought toward upgrading to priced services later), there are some limits. Criteria are mainly:

  • The amount of disk space available
  • The number of projects you can store
  • The number of user accounts you can create in order to access the SVN

This is my results table and it definitely not exhaustive:

Name Mb Projects Users Notes
Unfuddle 200 1 2
ProjectLocker 500 Unlimited 5
Origo ? ? ? Origo is an OpenSource project in itself and provides both the product so you can host it yourself and online services. I couldn’t find any information on online limits though
XP-Dev 200 2 Unilimited I used it once but found to my dismay they erased my account because of months of inactivity
Beanstalk 100 ? 3

From a purely numerical point-of-view, and paying features aside, my pick will surely be ProjectLocker. This is not to say the others are bad but the free account has also some very interesting features:

  • Unlimited Bandwidth Transfer
  • SSL Encryption (very unusual for a free service)
  • Redundant RAID storage and Nightly Backups

I would definitely be interested in knowing what free Subversion provider do you use for your closed-source projects.

Disclaimer: I haven’t had any contact from any person working from ProjectLocker. I’m not an employee of ProjectLocker. The following only reflects my unbiased opinion.

Hibernate hard facts – Part 4

February 17th, 2010 Nicolas Frankel No comments

In the fourth article of this serie, I will show the subtle differences between get() and load() methods.

Hibernate, like life, can be full of suprises. Today, I will share one with you: have you ever noticed that Hibernate provides you with 2 methods to load a persistent entity from the database tier? These two methods are get(Class, Serializable) and load(Class, Serializable) of the Session class and their respective variations.

Strangely enough, they both have the same signature. Strangely enough, both of their API description starts the same:

Return the persistent instance of the given entity class with the given identifier.

Most developers use them indifferently. It is a mistake since, if the entity is not found, get() will return null when load() will throw an Hibernate exception. This is well described in the API:

Return the persistent instance of the given entity class with the given identifier, assuming that the instance exists. You should not use this method to determine if an instance exists (use get() instead). Use this only to retrieve an instance that you assume exists, where non-existence would be an actual error.

Truth be told, the real difference lies elsewhere: the get() method returns an instance, whereas the load() method returns a proxy. Not convinced? Try the following code snippet:

Session session = factory.getCurrentSession();

Owner owner = (Owner) session.get(Owner.class, 1);

// Test the class of the object
assertSame(owner.getClass(), Owner.class);

The test pass, asserting that the owner’s class is in fact Owner. Now, in another session, try the following:

Session session = factory.getCurrentSession();

Owner owner = (Owner) session.load(Owner.class, 1);

// Test the class of the object
assertNotSame(owner.getClass(), Owner.class);

The test will pass too, asserting that the owner’s class is not Owner. If you spy the object in the debugger, you’ll see a Javassist proxyed instance and that fields are not initialized! Notice that in both cases, you are able to safely cast the instance to Owner. Calling getters will also return expected results.

Why call the load() method then? Because since it is a proxy, it won’t hit the DB until a getter method is called.

Moreover, these features are also available in JPA from the EntityManager, respectively with the find() and getReference() methods.

Yet, both behaviours are modified by Hibernate’s caching mechanism. Try the following code snippet:

// Loads the reference
session.load(Owner.class, 1);

Owner owner = (Owner) session.get(Owner.class, 1);

According to what was said before, owner’s real class should be the real McCoy. Dead wrong! Since Hibernate previously called load(), the get() looks in the Session cache (the 1st level one) and returns a proxy!

The behaviour is symmetrical with the following test, which will pass although it’s counter-intuitive:

// Gets the object
session.get(Owner.class, 1);

// Loads the reference, but looks for it in the cache and loads
// the real entity instead
Owner owner = (Owner) session.load(Owner.class, 1);

// Test the class of the object
assertSame(owner.getClass(), Owner.class);

Conclusion: Hibernate does a wonderful job at making ORM easier. Yet, it’s not an easy framework: be very wary for subtle behaviour differences.

The sources for the entire hard facts serie is available here in Eclipse/Maven format.

Categories: Java Tags: , , , ,

Seamless installation: convention over configuration

February 15th, 2010 Nicolas Frankel No comments

Today, I will not take the role of the architect that knows how to deliver applications but instead I will play the end-user part.

In a previous post, I was tasked to put a whole development infrastructure in place. A continuous integration server was indeed in order. I took a look at some, but I was really dumbfounded when I tried Hudson. Features are not what stroke me at that time (although Hudson’s features did serve me well) but only the ease of installation.

Let’s look at a traditional installation. The steps are the following:

  • Download the installer
  • Launch the installer
  • Accept the security warning (I’m on Windows, guess Nix users would probably sudo before)
  • Follow the wizard numerous steps (which probably includes accepting a license)

In turn, launching the Hudson test drive is a two-click process, the only thing needed being a local JVM:

  • Click the Java Web Start link
  • Accept the security warning

Let’s not dive into the technical details on how it is done. I’m only interested with the results: with only two mouse clicks, Hudson launches its console and you can start working. From a user point of view, that’s real value! Now, I understand that such an installation is just for example purposes; yet, this is really nice to have a product ready to run in such a few steps.

Maven invented the convention over configuration so that build managers would not have to write the same tasks over and over for each of their projects. Learning its lessons from EJB2, Sun took the same path for EJB3: developers now really have less code to write. Build managers and developers are end-users in these processes. As the product end-user, I would really like to install it from some common sense default configuration. If needed, I should be able to overload this convention (Hudson does not provide this overloading because the goal is to test the product quickly).

As an architect, I think the installation domain area is pretty uncharted. We are much focused on clean code, maintenability, design and such. Some of us even sometimes explore the interface and ergonomy of the product. All of these are fine and needed but  not enough IMHO. Think of the installation process too and of Hudson’s example so that we, as end-users, can benefit from seamless installation.

Securing middleware products

February 9th, 2010 Nicolas Frankel No comments

My work is IT architecture, meaning I focus on the early steps of a project. Once the application is in production, I usually leave it to systems and production engineers. For example, for JVM fine tuning, most of the clients I worked for have people that have the right skills to do that.

Nevertheless, I need sometimes to sully my nails. This happens in two cases: when the client is too small to have such dedicated teams or when its production team are not experienced enough to handle the problem at hand. Believe it or not, it happened to me that I had to show WebSphere administrators how to connect JAAC connectors to a LDAP server.

Anyway, I always value information on how to handle cases out of my usual scope: first, it never hurts to know more. Second, it is sometimes handy to sort what production teams tell you: some is real stuff, some is bulls. Likewise, I invite production teams to learn about development so that they may sort what is told to them too. Learning the other’s craft let you increase comprehension between different teams.

Free checklist audits

This week, I learned about a site that propose free benchmarks to audit your infrastructure’s security. This site is the Center for Internet Security. Proposed benchmarks are two-fold: part document about what is audited, part benchmarking tool. The former is freely downloadable; as for the second part, you must register. The rest of this article will focus on the document.

Though many subjects will always be beyond my reach (I will never accept to secure an Oracle Database), one document is of utmost interest to me: the benchmark on Apache  Tomcat.

This file include rules that, once you comply with them, will make your product more secure. Even if most of them are no-nonsense and you could think about it yourself, the document make a nice check-list. Some rules are really interesting in that I am afraid they are seldom enforced, some because of neglect, some because of lack of knowledge of the product.

Enhancements

Checklists provided by the CIS do lack some things though:

  • risk correlated to statistics. Some security holes aren’t used by many hackers. How should I prioritize?
  • risk correlated to damage. What’s the potential damage of not underdoing this action? For example, session hijacking will compromize users interactions with my application, not my server
  • trade-off. Many security features are not always desirable, and most have a trade-off, often in terms of performance. When I browse a merchant site, crypting my communications is overkill. Only during the payment phase is a real need to keep information secret.

Rules examples

For Tomcat, here’s is a sample of the audited rules.

Separate Web content directory from Tomcat’s system files

Tomcat comes with its own file structure, including a webapp directory where webapps should reside. Yet, nothing prevents webapps to be outside this directory, even on another partition. From a security point-of-view, this will avoid directory traversal exploits: if a malicious user gains access to the webapps directory, he will not have access to the server.

Moreover, from a maintenance point-of-view, you are able to upgrade Tomcat without redeploying your applications.

Disable session façade recycling

Tomcat’s model is to use façade on every entity of the HTTP model: request, response, session, etc. By default, Tomcat’s façades over sessions are reused when processing new requests in order to optimize memory use. Thus, this could lead a new request to have access to informations on sessions that are not tied to it. This is a security risk and should be turned off if one’s want to secure the server.

Disable auto-deployment

Tomcat’s default behaviour is to have a running thread that watches the webapps directory. Once a new war is detected by this thread, it deploys it automatically. Such action is very enjoyable in a development environment. In a production environment, users that have access to the directory could potentially put malicious webapps in it and have it deployed automatically. Thus, disabling auto-deployment increases the security of the Tomcat’ server.

Conclusion

Checklists provided by the CIS are very nice to have for production and security engineers. However, one should carefully evaluate the cost of enforcing the rule agains the risk of not enforcing it. Those are either not detailed enough in the documentation, or not provided at all.

To go further:

Categories: Technical Tags: ,

Maven The complete reference

February 3rd, 2010 Nicolas Frankel 2 comments

This review is about Sonatype’s Maven: The complete reference by Tim O’Brien, John Casey, Brian Fox, Jason Van Zyl, Eric Redmond and Larry Shatzer.

Disclaimer: I learned Maven from Sonatype’s site 3 years ago. I found it was a great tool to learn Maven. Now that I have a little more experience in the tool, I tried to write this review in an objective manner.

Facts

  1. 13 chapters, 267 pages, free (see below)
  2. This book is intended for both readers who wants to learn Maven from scratch and for readers who need to look for a quick help on an obscure feature
  3. A whole chapter is dedicated to the Maven assembly plugin
  4. Another chapter is dedicated to Flexmojos, a Sonatype plugin to manage Flex projects

Pros

  1. First of all, this book is 100% free to view and to download. This is rare enough to be state!
  2. Complete reference books are sometimes a mere paraphrase of a product’s documentation. This one is not. I do not claim I’m a Maven expert but I did learn things in here
  3. This book is up-to-date with Maven 2.2. For example, it explains password encryption (available since Maven 2.1.0) or how to configure plugins called from the command line differently using default-cli (since Maven 2.2.0)
  4. A very interesting point is a list of some (all?) JEE API released by the Geronimo project and referenced by groupId and artifactId. If you frown because the point is lost on you, just try using classes from activation.jar (javax.activation:activation): you’ll never be able to let Maven download it for you since it is not available in the first place for licensing reasons. Having an alternative from Geronimo is good, knowing what is available thanks to the book is better

Cons

To be frank, I only found a problem with Maven: The complete reference. Although a whole chapter is written on the Maven Assembly plugin, I understood nothing from it… The rest of the book is crystal clear, this chapter only obfuscated the few things I thought I knew about the plugin.

Conclusion

This book is top quality and free: what can I say? If you’re a beginner in Maven, you’ll find a real stable base to learn from. If you need to update your knowledge, you will find a wealth of information. If you’re a Maven guru, please contribute to the Assembly plugin’s chapter. I can only give a warm thank you for Sonatype’s effort for giving this quality book to the community.

Categories: Book review Tags: ,

Context root tweaking

February 1st, 2010 Nicolas Frankel No comments

JEE never ceases to amaze me. Even when I think I’m on top and I know all there’s to know about webapps, I’m in for a surprise. Good news is, whatever you think you know about a subject, there’s still room for one more fact. Bad news is, I’m deeply disturbed by what I learned.

Fact is: web applications context root can contain the / characte, meaning an URL such as http://localhost/multipart/context can refer to the root of the multipart/context webapp as well as to the context servlet mapping of the http://localhost/multipart webapp. When I was told that, my first reaction was disbelief. I immediately hurried to run a few tests on the JOnAS and JBoss application servers and it confirmed that was entirely possible.

In fact, if you look at the application.xml XML Schema, you see that the context-root is of a type that extends string (with an id) :

This means the context root can effectively contains anything, including slashes, backslashes and what have you.

The J2EE 1.4 specifications does not enforce additional constraints (p 125) :

Each web module must be given a distinct and non-overlapping name for its context root. [...] See the servlet specification for detailed requirements of context root naming.

I did not see additional requirements in the Servlet 1.4 specifications for the context root.

Tomcat even takes this tweak into account when creating Context with individual XML files. It calls it “multi-level context paths” :

In individual files (with a “.xml” extension) in the $CATALINA_HOME/conf/[enginename]/[hostname]/ directory. The name of the file (less the .xml) extension will be used as the context path. Multi-level context paths may be defined using #, e.g. foo#bar.xml for a context path of /foo/bar.

So, in order to deploy the context.war under the multipart/context root, you’ll have to name the context XML file multipart#context.xml.

The content of the file is the following:

<?xml version='1.0' encoding='utf-8'?>
<Context docBase="${catalina.home}/context.war" />
<!-- this means the context.war should be available at the root of Tomcat -->

In the tests I’ve made, if the multi-level context path shadows a classic context-root with a servlet mapping, the former takes precedence. I do not advise using this since the case is not specified in the specification, it could be handled differently from product to product.

OK, now we’ve established these facts, what is the point in knowing this, aside from setting you as the übergeek in a JEE geek convention? Since http://localhost/multipart/context and http://localhost/multipart are clearly separated web applications, they do not even share context.

In my case, that was the solution for a simple use-case. Imagine monitoring web applications: you know the URL you have to monitor. Now a product, developped in-house for diagnostics purpose, is deployed side-by-side with each webapp in webapp form. It would be very nice if you could reach the diagnostics webapp without looking at the documentation to know its URL. Let’s say it is the business webapp’s URL appended with /diag.

So, if the main webapp’s URL is http://myserver.com/mywebapp and the person in charge of monitoring knows about it, he knows he has to access http://myserver.com/mywebapp/diag and he gets what he wants! On the development side, it means both webapps are different products, are developed by different teams and have different lifecycles.

Categories: JEE Tags: , , ,