Sunday, December 11, 2011

Fixing libssl and libcrypto Errors in Datastax OpsCenter Startup

Update 2011-12-19: For 64-bit Amazon Linux AMI, install openssl0.9.8 by the command "sudo yum install openssl098e-0.9.8e-17.7.amzn1.x86_64". Thanks to thobbs from the datatax forum for this tip.

The AWS Linux AMI I use has openssl 1.0.0 but DataStax OpsCenter 1.3.1 requires version 0.9.8 of libssl and libcrypto. Why didn't they say so in the docs?? The worst customer experience you can give to your user base is to let your software blow up at startup like this:

Failed to load application: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory

This problem was apparently reported a month ago:
http://www.datastax.com/support-forums/topic/issue-starting-opscenterd-service

But no action has been taken to correct it...sigh...

Here is how we can fix it temporarily on our own before the Cassandra devs get their acts together:

1) Install openssl 0.9.8
sudo yum install openssl098e-0.9.8e-17.7.amzn1.i686

2) Change to /usr/lib and manually create following two symbolic links:
sudo ln -s libssl.so.0.9.8e libssl.so.0.9.8
sudo ln -s libcrypto.so.0.9.8e libcrypto.so.0.9.8

Now OpsCenter will start without the dreaded ssl error.

Monday, November 21, 2011

Cassandra Range Query Using CompositeType

CompositeType is a powerful technique to create indices using regular column families instead of super families. But there is a dearth of information on how to use CompositeType in Cassandra. Introduced in 0.8.1 in May 2011 , it is a relatively new comer to Cassandra. It doesn't help that it is not even in the "official" datatype documentation on Casandra 1.0 and 0.8! This article pieces together various tidbits to bring you a complete how-to guide on programming CompositeType. The code examples will use Hector.

Let's say we want to define a column family as the following:
row key: string
column key: composite of an integer and a string
column value: string

We can define the following schema on the cli:

create column family MyCF
    with comparator = 'CompositeType(IntegerType,UTF8Type)'
    and key_validation_class = 'UTF8Type'
    and default_validation_class = 'UTF8Type';

We can also define the same schema programmatically in Hector:

// Step 1: Create a cluster
CassandraHostConfigurator chc 
      = new CassandraHostConfigurator("localhost");
Cluster cluster = HFactory.getOrCreateCluster(
                        "Test Cluster", chc);

// Step 2: Create the schema
ColumnFamilyDefinition myCfd 
      = HFactory.createColumnFamilyDefinition(
            "MyKS", "MyCF", ComparatorType.COMPOSITETYPE);
// Thanks to Shane Perry for this tip.
// http://groups.google.com/group/hector-users/
//       browse_thread/thread/ffd0895a17c7b43e)
myCfd.setComparatorTypeAlias("(IntegerType, UTF8Type)");
myCfd.setKeyValidationClass(UTF8Type.class.getName());
myCfd.setDefaultValidationClass(UTF8Type.class.getName());
KeyspaceDefinition myKs = HFactory.createKeyspaceDefinition(
      "MyKS", ThriftKsDef.DEF_STRATEGY_CLASS, 1, 
      Arrays.asList(myCfd));

// Step 3: Add schema to the cluster
cluster.addKeyspace(myKs, true);
KeySpace ks = HFactory.createKeyspace(myKs, cluster);

Now let's insert a single row with 2 columns:

String rowKey = "row1";

// First column key
Composite colKey1 = new Composite();
colKey1.addComponent(1, IntegerSerializer.get());
colKey1.addComponent("c1", StringSerializer.get());

// Second column key
Composite colKey2 = new Composite();
colKey2.addComponent(2, IntegerSerializer.get());
colKey2.addComponent("c2", StringSerializer.get());

// Insert both columns into row1 at once
Mutator<String> m 
      = HFactory.createMutator(ks, LongSerializer.get());
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey1, "foo", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.addInsertion(rowKey, "MyCF", 
      HFactory.createColumn(colKey2, "bar", 
                            new CompositeSerializer(), 
                            StringSerializer.get()));
m.execute();

After the insertion, the column family should look like this table:

row1 {1, c1} {2, c2}
foo bar

Now let's retrieve the first column using a slice query on only the first integer component of composite column key. Since Cassandra orders composite keys by components in each composite, we can construct a search range from {0, "a"} to {1, "\uFFFF} which will include {1, "c1"} but not {2, "c2"}.

SliceQuery<String, Composite, String> sq 
      =  HFactory.createSliceQuery(ks, StringSerializer(), 
                                   new CompositeSerializer(), 
                                   StringSerializer());
sq.setColumnFamily("MyCF");
sq.setKey("row1");

// Create a composite search range
Composite start = new Composite();
start.addComponent(0, IntegerSerializer.get());
start.addComponent("a", StringSerliazer.get());
Composite finish = new Composite();
finish.addComponent(1, IntegerSerializer.get());
finish.addComponent(Character.toString(Character.MAX_VALUE), 
                    StringSerliazer.get());
sq.setRange(start, finish, false, 100);

// Now search.
sq.execute();
// TODO: Parse the result to get the first column

It is unfortunate that a JavaDoc typo in the Cassandra source code prevents tools like Eclipse from displaying documentation about CompositeType. But you can always view the source online to get the precision definition and encoding scheme of CompositeType. Reading source code has been and is still the best way of learning new features in Cassandra.

Wednesday, November 16, 2011

GuiceFilter and Static Resources

GuiceContextListener and GuiceFilter can eliminate servlet mappings completely from the web.xml file. They also preserve the default servlet handling logic by re-routing a URL request back to the default servlet when none of Guice-configured servlet matches the requested URL. This technique can be used to serve static resources from an Guice-configured web app. For example, assuming this simple web.xml that routes everything through Guice:
<webapp>
  <listener>
    <listener-class>com.myapp.MyGuiceContextListener</listener-class>
  </listener>
  <filter>
    <filter-name>guiceFilter</filter-name>
    <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
  </filter>
</webapp>
If we have a file named myIndex.html at the same directory level with WEB-INF in the web app layout, we can then easily request this file by using an URL like this:
http://mydomain/myIndex.html
In this case, the GuiceFilter is smart enough to reroute the request to the default servlet for serving the myIndex.html file.

Thursday, November 3, 2011

Rules for Configuring ThreadPoolExecutor Pool Size

I came across a blog Rules of ThreadPoolExecutor Pool Size. In that blog, the author explained reasonably well how TPE creates new threads in relation to thread pool size. But the author incorrectly stated that the TPE would always fill a task queue first before creating a new thread. Thread creation is determined by queuing strategy. Choosing a direct handoff strategy will achieve the author's "user anticipated way". But there is risk in that particular way. Here is a table listing the general pros and cons of each queuing strategy:


Queuing Strategy
Example
Design Trade-off
Worst Case
Usage
Direct Handoff
SynchronousQueue
Zero task queue but can create unlimited number of threads.
OutOfMemory
For tasks that may have interdependency. For example, task i changes a global state that affects the execution of task j.
Unbounded Queue
LinkedBlokingQueue
Limit the number of threads by the max pool size but allows submission of unlimited tasks.
OutOfMemory
For tasks that are completely independent of each other.
BoundedQueue
ArrayBlockingQueue
Limit both the number of threads and the queue size.
Low throughput from imbalanced pool and queue sizes.
For burning midnight oil.



Wednesday, October 26, 2011

The “initial_token” in Cassandra Means the “Very First Time”

Cassandra uses tokens to split key ranges across nodes. When a Cassandra node is started the very first time, it will check if an “initial token” is specified in cassandra.yaml; otherwise, the node will generate a token from the cluster it is joining. But how does a node know that it is being started the “very first time”? It is simple. The token is stored on the local disk and persists across process start/stop. Therefore, once a token is stored, changing the “initial_token” parameter in cassandra.yaml will have no effect. When multiple nodes have the same token, Cassandra will elect a new owner of the token, print out an warning and then continue on. The nodetool however will under-report the number of nodes in a ring because it only probes nodes that have unique tokens. It is such a common problem when making Cassandra VM images that it even gets its own FAQ on the Cassandra wiki. The only safe way to create a new token cleanly is to wipe out the data and commit logs and then restart the node.

Friday, October 21, 2011

Configuring Maven to Use a Local Library Folder

The official "Maven Way" of dependency management is to use Maven Central and local repository specified in the settings.xml file (which usually points to $HOME/.m2/repository. While it works great for projects that rely on a large number of open source libraries and satisfies 95% of dependency management needs in those projects, there is that 5% of the time when a jar is not sourced from a Maven project. One example is a jar using JNI so is only available for certain OS platforms. How do we integrate this jar into Maven dependency management? If you search the web for hints, you may be led to believe that you either have to bend to the Maven Way or to use the systemPath. But the Maven Way will force you to maintain a local repository for a trival library. The systemPath on the other hand does not work naturally with packaging. Developers will most likely ask "Can I check in this library to my (your_favorite_VCS) with my project and still have Maven use it in a way just like any other dependency?" The answer is YES. Just follow the steps below:

1. Create a directory under your project, say "lib".

2. Use Maven to install your jar to the lib directory.
mvn install:install-file -Dfile=path_to_mylib.jar -DgroupId=com.mylib -DartifactId=mylib -Dversion=1.0 -Dpackaging=jar -DlocalRepositoryPath=path_to_my_project/lib

3. Setup your POM like this.
  <repositories>
     <repository>
         <!-- DO NOT set id to "local" because it is reserved by Maven -->
         <id>lib</id>
         <url>file://${project.basedir}/lib</url>
     </repository>
  </repositories>
  <dependencies>
    <dependency>
        <groupId>com.mylib</groupId>
        <artifactId>mylib</artifactId>
        <version>1.0</version>
    </dependency>
  </dependencies>


Now you can check in/out mylib.jar just like any other file in your project and Maven will manage the dependency on mylib.jar just like any other dependency artifact. Perfect harmony. :-)

Thursday, October 20, 2011

Log4J Appender Additivity in Plain English

Let's start with the root logger in a Log4j.properties file:

log4j.rootLogger=INFO,stdout

This root logger is configured to have a logging level INFO with an appender named stdout. Now we want to turn debug on in our own package but keep the rest at the INFO level. So we add this to our Log4j.properties file:

log4j.category.com.mypackage.name=DEBUG
log4j.rootLogger=INFO,stdout

Everything looks good. But then we want to pipe our debug log to a different appender so we change the configuration to:

log4j.category.com.mypackage.name=DEBUG, myappender
log4j.rootLogger=INFO,stdout

When we start our app, we suddently notice that our debug logs still show up in stdout in addition to myappender! This is caused by appender additivity. To turn it off, change the additivity flag to false:

log4j.category.com.mypackage.name=DEBUG, myappender
log4j.additivity.com.mypackage.name=false
log4j.rootLogger=INFO,stdout

Monday, October 17, 2011

Counting All Rows in Cassandra

Update Oct. 25, 2011: Fixed missing key type in the code fragment.

The SQL language makes counting rows deceptively simple:
SELECT count(*) from MYTABLE;
The count function in the select clause iterates through all rows retrieved from mytable to arrive at a total count. But it is an anti-pattern to iterate through all rows in a column family in Cassandra because Cassandra is a distributed datastore. By its very nature of Big-Data, the total row count of a column family may not even fit in memory on a single 32-bit machine! But sometimes when you load a large static lookup table into a column family, you may want to verify that all rows are indeed stored in the cluster. However, before you start writing code to count rows, you should remember that:
  • Counting by retrieving all rows is slow.
  • The first scan may not return the total count due to delay in replication.
Now, we know why we shouldn't iterate through all rows in Cassandra in the first place, we can proceed to write a little function to do exactly that for those rare occasions. Below is an example using Hector and the iterative method. The key space in this example uses Random Partitioner. The example function uses the Range Slice Query technique to iterate through all rows in the order of MD5 hash value of keys. Note that Cassandra uses MD5 hash interally for Random Partitioner.
   public int totalRowCount() {
      String start = null;
      String lastEnd = null;
      int count = 0;
      while (true) {
         RangeSlicesQuery<String, String, String> rsq = 
            HFactory.createRangeSlicesQuery(ksp, StringSerializer.get(),
                  StringSerializer.get(), StringSerializer.get());
         rsq.setColumnFamily("MY_CF");
         rsq.setColumnNames("MY_CNAME");
         // Nulls are the same as get_range_slices with empty strs.
         rsq.setKeys(start, null); 
         rsq.setReturnKeysOnly(); // Return column names instead of values
         rsq.setRowCount(1000); // Arbiturary default
         OrderedRows<String, String, String> rows = rsq.execute().get();
         int rowCount = rows.getCount();
         if (rowCount == 0) {
            break;
         } else {
            start = rows.peekLast().getKey();
            if (lastEnd != null && start.compareTo(lastEnd) == 0) {
               break;
            }
            count += rowCount - 1; // Key range is inclusive
            lastEnd = start;
         }
      }
      if (count > 0) {
         count += 1;
      }
      return count;
   }
Recursion would be a more elegant solution but be aware of the stack limitation in Java.

Wednesday, August 17, 2011

Bundling Local File System with ec2-user

Remote root login is disabled on all Amazon Linux AMIs to prevent exploits. By default, you can only login as "ec2-user" to access instances launched from those AMIs. The ec2-user is in the sudo group so it can perform tasks that require root privilege. One example is ec2-bundle-vol that bundles up local file system for creating custom AMI. But unlike ec2-user, sudo is not configured with the default AWS environment variables like EC2_HOME and etc. An easy workaround is to use the "-E" option to let sudo inherit environment variables from ec2-user. From the sudo man page:

The -E (preserve environment) option will override the env_reset option in sudoers(5)). It is only available when either the matching command has the SETENV tag or the setenv option is set in sudoers(5).
Now, we can execute the ec2-bundle-vol command like this:

sudo -E /opt/aws/bin/ec2-bundle-vol -k /media/ephemeral0/private-key.pem -c /media/ephemeral0/cert.pem -u XXXXXXXXXXXX

I am surprised that this is not documented in the official doc.

Tuesday, July 26, 2011

Taking Control of Maven Multi-Module Project

Maven has come a long way in supporting an assembly line style of software development. It is common for a non-trivial project to have multiple inter-dependent modules, external system dependencies and geographically distributed teams. This blog provides a few tips on some standard Maven tools to help developers tackle this complexity.

Building Single Module in a Multi-Module Project

When multiple teams work on a large multi-module project, it is often desirable to build just one module instead of the world. But Maven doesn’t make this obvious. For example, if the sub-module-A depends on the sub-module-B, building sub-module A was not even possible in 2009 without “mvn install” sub-module-B in the repository first! Fortunately, Maven has evolved. This is where the advanced reactor options come in. To build a single module, run this command at the parent level:

mvn –projects sub-module-A –also-make clean test
The “—also-make” option tells Maven to automatically build all modules that sub-module-A depends on before building A. No “mvn install” of dependent modules is needed. This ensures a clean build for single-module building.

Build Profile
Developers often want to set up repository locations and test environments in physical proximity because of cost and regulation differences between geographical regions. Build profiles allow developers to tailor their projects to diverse build environments without interfering with each other.

Command-Line Settings.xml
Secure environments often require access credentials. For security reasons, Maven requires these credentials to be stored in settings.xml. This often causes more problems than it solves. One workaround is to store different versions of settings.xml files along with the project files in the source repository. Then when a project is checked out, the “-s” command line option can be used to specify which settings.xml file to use for running Maven.

mvn –s ${CHECKOUT_DIR}/setting/settings.xml clean test

Friday, July 15, 2011

Automating Remote File Copy

When you need to move files around inside a secure environment, rcp and ftp scripts can be a real time saver.

rcp

The easiest way to copy a file from any machine to a destination host without using password is to use the .rhosts file. This file should reside in a user's home directory. For example, for user "foo" on host "dest", this file will contain this line:


+ foo


Now you can remote copy files from anywhere to host "dest" without typing a password:

rcp file foo@dest:/home/foo/.

Scripting ftp

The trick to script ftp command is to turn off the interactive mode by using the "-n" option. This allows you to put password in the script. Here is an example to ftp a file from foo's home directory:


#!/bin/sh
ftp -n dest  << whatever
user foo foopass
bin
get file
bye


These two little commands can do wonders in a heterogeneous but secure environment with mixed Linux/Unix/Windows boxes. Use them to your heart's content.

Wednesday, July 13, 2011

Maven Failsafe Plugin Gotcha

I should have known this but it didn't occur to me when integration tests in a Maven project suddently stopped running any test at all. It turned out that the Maven Failsafe Plugin did not have a notion of "compiling test classes" on its own. It is normally not a problem when the plugin is used in a lifecyle. But it falls apart when the failsafe:integration-test goal is executed independently like this:

mvn clean failsafe:integration-test

In this case, the plugin will not run any integration test because not a single class is compiled after clean. The plugin depends on class file naming conventions (such as IT*.class) to run tests. No class file, no test.

Tuesday, June 7, 2011

Configuring and Customizing Elastic Bamboo

The Elastic Bamboo feature in JIRA Studio can utilize Amazon EC2 for builds. But integrating EC2 into Bamboo can be an overwhelming task for a beginner. To complement Atlassian's documentation, this article gives a few tips on how to setup a EC2 build account and configure EB. Even though this is not a Bamboo or a AWS tutorial, you can follow links in the article to learn more about the subjects applicable to the task at hand.

Setting up a Special Build User in AWS

Since EB needs AWS security credentials to launch and manage EC2 instances, you can use AWS Identity and Access Management (IAM) to setup a special build user with only EC2 permissions. Let's call this special user "eb-builder". Give this user its own access keys and X.509 certificates. Follow the IAM User Guide if you don't already know how to do this.

Configuring Elastic Bamboo to Use AWS

SSH Key Pair
These are the keys you will need later to log onto the EC2 instance launched by EB. When you configure EB for the first time, it will automatically generate a key pair for you called elasticbamboo. If you use hosted JIRA Studio, the private key is stored in the Atlassian host and can be downloaded later.

AWS Access Keys
EB needs access keys to call AWS APIs in order to perform AWS actions such as us launching EC2 instances. Instead of supplying EB the access keys from your master AWS account, use access keys you generated earlier for eb-builder.

AWS X.509 Certificate and Private Key
EB needs both the certificate and the private key to use EBS. Again, supplying EB with the certificate and the private key you generated earlier for eb-builder, not the ones you created for your master AWS account. This is a good security practice to keep in mind.

Customizing Elastic Bamboo Default Image

There are two reasons for customizing the default EB image:

(1) To persist build dependencies like 3rd party libraries. The default image uses instance-store which does not persist after the EC2 instance is gone. One easy way to persist build dependencies is to use an EBS volume.
(2) To add your own builder. For example, you may want Maven 3 instead of Maven 2 that comes with the default image.

Use EBS
EBS comes in handy when you have relatively static build dependencies like 3rd party libraries. It also allows you to install your own build tools instead of using those from the default image. You can follow the steps here to create an EBS volume to store your custom bits. But note Step 2 in Creating your first EBS snapshot on that link. That step is not applicable to the hosted version of JIRA Studio. In that case, you must file a support ticket and ask Atlassian to load your X.509 certificate and the private key for you. If you install a new Maven release like Maven 3 or a new JDK, do not put them in the bamboo:bamboo user path. Instead, configure their paths when you add a new custom builder as below.

Add Custom Builder
Let's say you have installed Maven 3 under /mnt/bamboo-ebs/maven3 and made a snapshot of the EBS volume. To configure EB to use the new Maven bits, you follow the Add Capabilities steps here. When adding a new builder for Maven 3, make sure you select the "Maven 2.x" capability type. Hopefully Atlassian will add an official Maven 3 capability type in the near future. Once you complete those steps, the new Maven 3 builder will show up as a builder candidate when you setup a build plan for your project.

That's the gist of configuring and customizing Elastic Bamboo. Happy building!

Sunday, May 29, 2011

Using Guice-ified Jersey in Embedded Jetty

While IoC is a terrific way to promote software modularity, someone still has to write that bootstrap code to connect all the dots. Even though there is abundant information on how to use Guice, Jersey and Jetty respectively in a servlet environment, little is known about using all three together to programmatically configure servlets in Java. Here I will use an example to demonstrate how to write RESTful servlet using all three:
  1. Assemble framework libraries using Maven.
  2. Create a POJO interface and an implementation.
  3. Write a JAX-RS resoure to use Guice constructor injection and JAX-RS annotations.
  4. Write the Guice bootstrap code.
  5. Write the embedded Jetty start up code.
  6. Run and test.
First, assemble all necessary parts in a Maven pom.xml like this:
   
       7.4.1.v20110513
       1.7
       3.0
   

   
    
        org.eclipse.jetty
        jetty-servlet
        ${jetty.version}
    
 
     com.google.inject
     guice
     ${guice.verion}
 
 
     com.sun.jersey
     jersey-server
     ${jersey.version}
 
 
     com.sun.jersey.contribs
     jersey-guice
     ${jersey.version}
  
    
      junit
      junit
      ${junit.version}
      test
    
   
     
  
 
     maven2-repository.java.net
     Java.net Repository for Maven
     http://download.java.net/maven/2/
     default
  
  

We start with a simple POJO interface:
public interface GuicyInterface {
   String get();
}
And a simple implementation:
public class GuicyInterfaceImpl implements GuicyInterface {

   public String get() {
      return GuicyInterfaceImpl.class.getName();
   }
}
Now, write a JAX-RS resource to use both Guice and JAX-RS annotated injections:
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.QueryParam;

import com.google.inject.Inject;@Path("/helloguice")

public class HelloGuice {
   private final GuicyInterface gi;
   
   @Inject
   public HelloGuice(final GuicyInterface gi) {
      this.gi = gi;
   }
   @GET
   @Produces("text/plain")
   public String get(@QueryParam("x") String x) {
      return "Howdy Guice. " + "Injected impl " + gi.toString() + ". Injected query parameter "+ (x != null ? "x = " + x : "x is not injected");
   }
}

Next, compose POJO bindins in a JerseyServletModule. This module will setup the Jersey-based JAX-RS framework for use with Guide injection. The GuiceServletContextListener is used to bootstrap Guice when the servet context is initialized.
import com.google.inject.Guice;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceServletContextListener;
import com.sun.jersey.guice.JerseyServletModule;
import com.sun.jersey.guice.spi.container.servlet.GuiceContainer;

public class HelloGuiceServletConfig extends GuiceServletContextListener {
   @Override
   protected Injector getInjector() {
      return Guice.createInjector(new JerseyServletModule() {
         @Override
         protected void configureServlets() {
            // Must configure at least one JAX-RS resource or the 
            // server will fail to start.
            bind(HelloGuice.class);
            bind(GuicyInterface.class).to(GuicyInterfaceImpl.class);
            
            // Route all requests through GuiceContainer
            serve("/*").with(GuiceContainer.class);
         }
      });
   }
}

Finally, write the main method using embedded Jersey to start Guice and Jersey together.
import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.DefaultServlet;
import org.eclipse.jetty.servlet.ServletContextHandler;

import com.google.inject.servlet.GuiceFilter;

public class GuiceLauncher {
   public static void main(String[] args) throws Exception {
      // Create the server.
      Server server = new Server(8080);
      
      // Create a servlet context and add the jersey servlet.
      ServletContextHandler sch = new ServletContextHandler(server, "/");
      
      // Add our Guice listener that includes our bindings
      sch.addEventListener(new HelloGuiceServletConfig());
      
      // Then add GuiceFilter and configure the server to 
      // reroute all requests through this filter. 
      sch.addFilter(GuiceFilter.class, "/*", null);
      
      // Must add DefaultServlet for embedded Jetty. 
      // Failing to do this will cause 404 errors.
      // This is not needed if web.xml is used instead.
      sch.addServlet(DefaultServlet.class, "/");
      
      // Start the server
      server.start();
      server.join();
   }
}

If you run the main program on your local host, you can test the servlet using this URL:
http://localhost:8080/helloguice?x=q
Then, you should see a response like this:
Howdy Guice. Injected impl GuicyInterfaceImpl@3aaa3518. Injected query parameter x = q
Congratulations! You have now mastered the three most popular IoC frameworks for programming RESTful servlets!

Thursday, May 5, 2011

Singleton Injection in JAX-RS

JAX-RS defines a Java API for implementing RESTful server-side functions in Java. Being a post-Spring framework, it uses annotations extensively to denotate "resource" injection points. One of these DI annotations is called Provider. A provider is an application supplied class used to extend a JAX-RS run-time such as Jersey. A JAX-RS implementation is required to load only one instance of a provider class in a JAX-RS run-time instance. An application developer can leverage this feature to inject singletons, e.g. a database manager class, into JAX-RS resources. Here is an example on how to do this.

First, define the singleton class.
public class MyDBManager{
   public void store(final String key, final String value) {
      // Store the key-value pair in a DB.
   }
}
As you can see, this class is just a POJO. No special singleton marking anywhere in the class definition.

Next, we will use ContextResolver and @Provider annotatoin to turn MyDBManager into a JAX-RS provider.
import javax.ws.rs.ext.ContextResolver;
import javax.ws.rs.ext.Provider;

@Provider
public class DBResolver implements ContextResolver<MyDBManager> {
   private MyDBManager db;
   
   // A provider must have at least a zero-arg constructor.
   public DBResolver() {
      db = new MyDBManager();
   }
   public MyDBManager getContext(Class type) {
      return db;
   }
}

Now we can inject MyDBManager into a resource class through @Context like this:
import javax.ws.rs.Consumes;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.core.Context;
import javax.ws.rs.ext.ContextResolver;

@Path("/myresrc")
public class MyResource {
   @POST
   @Path("/{id}")
   @Consumes("text/plain")
   @Produces("text/plain")
   public String post(final @Context ContextResolver myDbManager,
                      final @PathParam("id") String id) {
      myDbManager.getContext(MyResource.class).store(id, "someValue");
      return "OK";
   }
}

Two observations from this method of injecting a singleton:
  • The singleton instance of DBManager is not instantiated until MyResource is invoked the first time.
  • The type T in ContextResolver<T> can be an interface so an implementation of the ContextResolver<T> can return an implementation of the T. But, the implementation of T can only be determined by a Class object.
Therefore, this @Provider-based injection method should not be taken as a general purpose DI mechanism because JAX-RS is not a general purpose DI framework. It is not only ugly but also comes with a reflection overhead when the resource is invoked. Using Guice or Spring instead for general purpose DI.

Wednesday, March 23, 2011

Porting iptables 1.4.10 to Android

(Update 25 Aprile 2011: This port is now in Google Code called iptables4n1).

Introduction 

The Android source comes with iptables 1.3.7. But that distribution is not compatible with Linux kernel 2.6.32 or newer versions. For example, even though iptables 1.3.7 can be built into the Android Emulator and Google Dev Phone 2 (a.k.a. HTC Magic) which are based on Linux kernel 2.6.29, it won't compile with kernel 2.6.32 used in Nexus One and many newer Android devices. The main difference between iptables 1.3.7 and 1.4.x is the emergence of "xtables" in the latter version. Fortunately, Google has done a commendable job in keeping Android in sync with Linux kernel releases. This makes porting a new version of iptables to Android largely an exercise of makefile changes. I will outline the steps here as a service to the open source community.

Steps

Below are steps for porting iptables 1.4.10 to Android/Linux kernel 2.6.32.
  1. Check out the Android source.
  2. Download iptables 1.4.10 source from the Netfilter project.
  3. Go to the checked out Android source and change to $SRC/external/iptables. Delete the content underneath that directory (or make a backup if you want) and then copy the downloaded iptables 1.4.10 content to that directory.
  4. Create a new Android.mk file under $SRC/external/iptables. This is the Android makefile. You can model yours after the one from the original Android source but you need to accommodate naming changes to iptable extensions that went from libiptXXX in 1.3.7 to libxtXXX in 1.4.10.
  5. Change to $SRC/external/iptables/extensions and create a new create_initext4 file there. 
  6. Change back to $SRC/external/iptables and run make. Fix header inclusion issues as needed.
Sample Android.mk

ifneq ($(TARGET_SIMULATOR),true)
  BUILD_IPTABLES := 1
endif
ifeq ($(BUILD_IPTABLES),1)
LOCAL_PATH:= $(call my-dir)
#
# Build libraries
#
# libxtables
include $(CLEAR_VARS)
LOCAL_C_INCLUDES:= \
    $(LOCAL_PATH)/include/ \
    $(KERNEL_HEADERS)
LOCAL_CFLAGS:=-DNO_SHARED_LIBS
LOCAL_CFLAGS+=-DXTABLES_INTERNAL
LOCAL_CFLAGS+=-DIPTABLES_VERSION=\"1.4.10\"
LOCAL_CFLAGS+=-DXTABLES_VERSION=\"1.4.10\" # -DIPT_LIB_DIR=\"$(IPT_LIBDIR)\"
LOCAL_CFLAGS+=-DXTABLES_LIBDIR
LOCAL_SRC_FILES:= \
    xtables.c
LOCAL_MODULE_TAGS:=
LOCAL_MODULE:=libxtables
include $(BUILD_STATIC_LIBRARY)
# libip4tc
include $(CLEAR_VARS)
LOCAL_C_INCLUDES:= \
    $(KERNEL_HEADERS) \
    $(LOCAL_PATH)/include/
LOCAL_CFLAGS:=-DNO_SHARED_LIBS
LOCAL_CFLAGS+=-DXTABLES_INTERNAL
LOCAL_SRC_FILES:= \
    libiptc/libip4tc.c
LOCAL_MODULE_TAGS:=
LOCAL_MODULE:=libip4tc
include $(BUILD_STATIC_LIBRARY)
# libext4
include $(CLEAR_VARS)
LOCAL_MODULE_TAGS:=
LOCAL_MODULE:=libext4
# LOCAL_MODULE_CLASS must be defined before calling $(local-intermediates-dir)
#
LOCAL_MODULE_CLASS := STATIC_LIBRARIES
intermediates := $(call local-intermediates-dir)
LOCAL_C_INCLUDES:= \
    $(LOCAL_PATH)/include/ \
    $(KERNEL_HEADERS) \
    $(intermediates)/extensions/
LOCAL_CFLAGS:=-DNO_SHARED_LIBS
LOCAL_CFLAGS+=-DXTABLES_INTERNAL
LOCAL_CFLAGS+=-D_INIT=$*_init
LOCAL_CFLAGS+=-DIPTABLES_VERSION=\"1.4.10\"
LOCAL_CFLAGS+=-DXTABLES_VERSION=\"1.4.10\"
PF_EXT_SLIB:=ah addrtype ecn 
PF_EXT_SLIB+=icmp #2mark
PF_EXT_SLIB+=realm
PF_EXT_SLIB+=ttl unclean DNAT LOG #DSCP ECN
PF_EXT_SLIB+=MASQUERADE MIRROR NETMAP REDIRECT REJECT #MARK
PF_EXT_SLIB+=SAME SNAT ULOG # TOS TCPMSS TTL
PF_EXT_SLIB+=TAG
EXT_FUNC+=$(foreach T,$(PF_EXT_SLIB),ipt_$(T))
# xtable stuff
NEW_PF_EXT_SLIB:=comment conntrack connmark dscp tcpmss esp
NEW_PF_EXT_SLIB+=hashlimit helper iprange length limit mac multiport
NEW_PF_EXT_SLIB+=owner physdev pkttype policy sctp standard state tcp
NEW_PF_EXT_SLIB+=tos udp CLASSIFY CONNMARK
NEW_PF_EXT_SLIB+=NFQUEUE NOTRACK
EXT_FUNC+=$(foreach N,$(NEW_PF_EXT_SLIB),xt_$(N))
# generated headers
GEN_INITEXT:= $(intermediates)/extensions/gen_initext4.c
$(GEN_INITEXT): PRIVATE_PATH := $(LOCAL_PATH)
$(GEN_INITEXT): PRIVATE_CUSTOM_TOOL = $(PRIVATE_PATH)/extensions/create_initext4 "$(EXT_FUNC)" > $@
$(GEN_INITEXT): PRIVATE_MODULE := $(LOCAL_MODULE)
$(GEN_INITEXT):
    $(transform-generated-source)
$(intermediates)/extensions/initext4.o : $(GEN_INITEXT)
LOCAL_GENERATED_SOURCES:= $(GEN_INITEXT)
LOCAL_SRC_FILES:= \
    $(foreach T,$(PF_EXT_SLIB),extensions/libipt_$(T).c) \
    $(foreach N,$(NEW_PF_EXT_SLIB),extensions/libxt_$(N).c) \
    extensions/initext4.c
LOCAL_STATIC_LIBRARIES := \
    libc
include $(BUILD_STATIC_LIBRARY)
#
# Build iptables
#
include $(CLEAR_VARS)
LOCAL_C_INCLUDES:= \
    $(LOCAL_PATH)/include/ \
    $(KERNEL_HEADERS)
LOCAL_CFLAGS:=-DNO_SHARED_LIBS
LOCAL_CFLAGS+=-DXTABLES_INTERNAL
LOCAL_CFLAGS+=-DIPTABLES_VERSION=\"1.4.10\"
LOCAL_CFLAGS+=-DXTABLES_VERSION=\"1.4.10\" # -DIPT_LIB_DIR=\"$(IPT_LIBDIR)\"
#LOCAL_CFLAGS+=-DIPT_LIB_DIR=\"$(IPT_LIBDIR)\"
LOCAL_SRC_FILES:= \
    iptables.c \
    iptables-standalone.c \
        xshared.c
LOCAL_MODULE_TAGS:=
LOCAL_MODULE:=iptables
LOCAL_STATIC_LIBRARIES := \
    libip4tc \
    libext4  \
        libxtables
include $(BUILD_EXECUTABLE)
endif


Sample create_initext4

#!/bin/sh
echo ""
for i in $1; do
    echo "extern void lib${i}_init(void);";
done;
echo "void init_extensions(void);"
echo "void init_extensions(void) {"
for i in $1; do
    echo "    lib${i}_init();";
done
echo "}"



That is all. Happy hacking!


Sunday, March 20, 2011

Socket and O_NONBLOCK

It is often stated that a socket is created in blocking mode by default. However, not all implementations of the socket calls adhere to that behavior. Specialized operating systems can be particularly tricky when it comes to blocking and non-blocking mode . One should always test the O_NONBLOCK flag on a socket to be sure if it is in the blocking or nonblocking mode operation before proceeding to send/recv. Here are some example c code to demonstrate how to check and manipulate the O_NONBLOCK flag on a streaming socket.

/* !!! Error checking is omited for brevity !!! */
/* Create a streaming socket */
int sock = socket(AF_INET, SOCK_STREAM, 0);

/* Get the file descriptor flag */
int flags = fcntl(sock, F_GETFL, 0);

/* Set O_NONBLOCK if it is set
 * or unset it if it is not
 */
if ((flags & O_NONBLOCK ) == O_NONBLOCK) {
   fcntl(sock, F_SETFL, flags | O_NONBLOCK);  /* Set */
} else {
   fcntl(sock, F_SETFL, flags & (~O_NONBLOCK));  /* Unset */
}

Tuesday, February 22, 2011

How to Check the Version of a Linux Release

The tool uname is usually used to print the OS version on a Unix or Unix-like machine. But it does not always reveal the actual Linux distribution. For example:

myhost:/home/user$ uname -a
myhost:/home/user $ Linux myhost 2.6.32-24-generic-pae #43-Ubuntu SMP Thu Sep 16 15:30:27 UTC 2010 i686 GNU/Linux


If this Linux distribution is LSB compliant, we can use lsb_release to get what we need:

myhost: /home/user$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 10.04.1 LTS
Release:        10.04
Codename:       lucid


Tips in Using Gcore Dump with Java Tools

When analyzing gcore dump from a running JVM process, it is very important to know which Java binary used by gcore to generate the core dump. This is because gcore may not use the Java binary on user's path. If they are different, the user will not be able to attach to the core file to the Java tool in the path. Knowing this will save you hours of frantic head scratching. More information can be found on this discussion about using gcore with jmap.

Another issue is the core file size. A 32-bit Java tool will have problem attaching to a core dump greater than 2GB in size. A workaround is to generate the gcore on a 64-bit JVM and then use the 64-bit Java tool to load the core file.

Wednesday, February 2, 2011

Socket and DataInputStream

While Java provides great I/O abstraction facilities like InputStream and OutputStream, the non-blocking nature of Java NIO often leads to subtle behavior differences in high-level network programming constructs.  The case in point is DataInputStream, which enables an application to parse Java primitive types right out of an input stream.  However, the stream reading methods in the DataInputStream class exhibit two distinct behaviors when the underline stream is a socket.  The following methods are non-blocking:

read(byte[] b)
read(byte[] b, int off, int len)

A non-blocking read means that the first read(byte[] b, int off, int len)may not return the expected len number of bytes.  Therefore, the reader program is usually put into a loop to read repeatedly until the len of bytes are read or the EOF is reached.

The primitive type reading methods in DataInputStream are blocking.  Take readInt() as an example.  That read will block if there are not enough bytes, 4 in this case, in the stream.

Consistency is a good software engineering principle to practice.  When using DataInputStream to parse a socket input stream, it is best to always use the primitive type reading methods to avoid confusion of blocking vs non-blocking socket I/O.