2013-08-22

Apache HBase supports three primary client APIs that developers can use to bind applications with HBase: the Java API, the REST API, and the Thrift API. Therefore, as developers build apps against HBase, it’s very important for them to be aware of the compatibility guidelines with respect to CDH.

This blog post will describe the efforts that go into protecting the experience of a developer using the Java API. Through its testing work, Cloudera allows developers to write code and sleep well at night, knowing that their code will remain compatible through supported upgrade paths.

First, we’ll explore the compatibility guidelines themselves. From there, we will discuss some of the testing that ensures compatibility across CDH versions, as well as some of the interesting incompatibilities we’ve detected and fixed along the way.

Note that API compatibility testing work goes on both upstream and internal to Cloudera. In this blog, we will focus mainly on the internal testing work, with a focus primarily on the Java API.

Compatibility Policy and Versions

Applications written against CDH should continue working without rewrite or recompile for any minor version of a given CDH major version.

Between major versions, binary and RPC compatibility is best-effort, but not guaranteed. In order to allow product flexibility, deprecate APIs, and make major changes, it is often necessary to introduce breaking changes. We choose to do so only between major CDH versions and try to minimize the number and scope of those changes. The goal is to allow developers to make relatively minor modifications and recompile their code.

What is a Major and Minor Version of CDH?

If the CDH version is expressed as CDH X.Y.Z, then X is the major version and Y is the minor version. We also have point releases for each minor release, Z. For example, in CDH4.2.1, the major version is 4, the minor version is 2, and the point release is 1. A user who begins with 4.1.2, for example, should expect applications written against 4.1.2 to continue working when the cluster is upgraded to CDH 4.3.0. We call this “backward compatibility”. Also, if a user writes an application against CDH4.3.0, using only the features available in CDH4.1.2, that code should work with a cluster running CDH4.1.2. This is called “forward compatibility”.

How Do CDH Versions Map to HBase Versions?

For CDH4:

CDH4.0.x and CDH4.1.y are based on HBase 0.92.1.

CDH4.2.x is based on HBase 0.94.2.

CDH4.3.x is based on HBase 0.94.6.

For CDH5:

The beta will be based on HBase 0.95.2.

Note that CDH releases include bug fixes and sometimes even minor features back-ported from a later version of HBase.

Forms of Incompatibility

Using the hbase jar and the dependent libraries, it’s possible to write an HBase Java client. So what can go wrong when a cluster upgrade occurs?

RPC Incompatibilities

A Remote Procedure Call (RPC) incompatibility is a breakdown of understanding between the time a client has serialized a request and the time the client deserializes the response. For example, if the return type of a method changed, then the client cannot deserialize the response meaningfully and that call will fail. This will require a rewrite of that part of the application. Or, if a client request requires a specific set of arguments and the server expects a different set of arguments after an upgrade, that will cause an RPC incompatibility.

For more information about Hadoop RPC communication between client and server, we suggest taking a look at Hadoop RPC mechanism, InterProcess Communication (IPC). More information about IPC is available here: http://wiki.apache.org/hadoop/ipc

Binary Incompatibilities

Client code has dependencies. If the dependencies change in an incompatible way, then when the client attempts to load the class and use the offending method, the result will be a runtime exception. For example, a client written using CDH4.2.0 bits will have hbase-0.94.2-cdh4.2.0-security.jar as a dependency. From that jar, a client application may use org.apache.hadoop.hbase.client.HBaseAdmin. If the constructor for HBaseAdmin changes in CDH4.3.0 to include an additional argument, and the client swaps in hbase-0.94.2-cdh4.3.0-security.jar because the cluster is upgraded to CDH4.3.0, then the client code cannot instantiate the HBaseAdmin object. From there, the client ceases to work until it is fixed.

Sometimes breaking changes are more subtle. Suppose we change the return type of a method that was formerly ‘void’ to now return something. Although that change will not affect RPC compatibility, it will affect the binary compatibility. This is because the method’s signature has changed and the application cannot find the new version of that method in the new dependency jar.

Example I: A specific example of this is HBASE-8273. This issue brought our attention to binary compatibility. In this example, the HColumnDescriptor setter methods originally returned void. However, they were later changed to use a builder pattern, in which the return type was a reference to the object itself. This change broke binary compatibility, but not RPC compatibility.

Example II: During our testing of CDH4.2, our automation detected that two constructors in HTable were removed. Specifically:

 

We made sure to add it back in to CDH 4.2, so developers who wrote code using that constructor against the CDH 4.1 hbase jar would not have to rewrite their code when upgrading to CDH4.2. If we had not, there would have been a binary incompatibility that would likely have manifested itself as a NoSuchMethodException exception.

Other Compatibility Modes

There are other implicit expectations baked into some requests. A good example is related to an incompatibility with a call to HBaseAdmin.getClusterStatus().

The problem was a version incompatibility of the return type (v1 to v2). HBASE-7072 has more information about that issue. This was fixed in CDH4.2 and later.

RPC-Compatible vs. Binary-Compatible Clients

Use cases are a fundamental issue. Where and how are the client applications going to be run?

This is an important question when it comes down to dependencies. There are two places a client can be run: On a node of the cluster or on a machine not on the cluster. This is important because when an upgrade occurs, compatibility is stressed at least two different ways: RPC compatibility and binary compatibility.

The two images below illustrate the distinction between the two configurations, as well as what form of compatibility is tested. Suppose initially that a cluster is running CDH4.0.1 and a developer creates two applications. The first, denoted on the left, runs as a standalone client on a remote machine. That means it bundles all its dependencies with it and uses those to make RPC calls to the cluster. The second application runs on the cluster itself. That means that it uses the same jars as would be specified in `hbase classpath` to run. These are the same dependencies that the services (e.g. regionserver) use to run.

What happens after an upgrade of the cluster to CDH4.3.0? In the second image, note that for the RPC-compatible client, it is still the same 4.0.1 client and dependencies, but is now communicating with a server that is running CDH4.3.0. On the right, the binary-compatible client must now use a different set of dependencies to communicate with the CDH4.3.0 service.

How does this relate to RPC or binary compatibility? For the RPC-compatible client running on a remote machine, the RPC mechanism between CDH4.0.1 and CDH4.3.0 must hold in order for the client to continue working. Hence, when we create such a scenario, we are testing RPC compatibility (shown in red). For the binary-compatible client, we know that CDH4.3.0 dependencies will work with the CDH4.3.0 services. What we don’t yet know, is whether the compiled client code can work with the CDH4.3.0 dependencies. Hence, this form of testing will remove uncertainty about the binary compatibility of the client and its upgraded dependencies. This is shown in blue.

Before Cluster Upgrade to CDH4.3.0:



After Cluster Upgrade to CDH4.3.0:



Automation and Tools

Cloudera Internal Test Framework for Java Compatibility Testing

For all major versions, Cloudera releases a handful of minor versions. Since we guarantee compatibility for all versions of a major release, it is necessary to enumerate what those combinations are. The necessary combinations that should be tested for the Java API look like this: { server version } x { client version } x { use case }. In this case { use case } is either an RPC-compatible or binary-compatible client (see RPC-Compatible vs. Binary-Compatible Clients). A screenshot is shown below:



>The Testing Matrix: There are numerous combinations but continuous integration only needs to run for cdh4Nightly.

Constructing clusters quickly. Cloudera built a tool in-house that will stand up a single-node cluster from bits in a matter of minutes. Speed is important, as there are many combinations that need to be tested. This number grows at O(n^2) for total compatibility supported, and O(n)for the number of additional combinations per each new release.

Constructing client code at runtime. After constructing the server, we will then construct a client running client version bits. We have a standard set of API calls that are run as part of the client. The goal is to cover a large percentage of the most common API calls. This includes calls for data mutation (e.g. Put) and retrieval (e.g. Get). It also includes a wide sampling of admin operations (e.g split). As we add features, we can add additional calls and limit the server versions that they run against. An example is snapshots, which were added in CDH4.2.0. Our snapshots testing will generate compatibility coverage, but not run against older server versions.

Testing different use modes. To simulate the difference between RPC-compatible and binary-compatible use cases, it is necessary to adjust the jars that are being used to pose the request. First, the client is compiled using client version bits.

Then it becomes a question of how the client is run. The remote client’s request will set the classpath to use jars of version client version. The binary-compatible client’s request will run with the jars that the server is currently using.

The result is that this testing will reveal different compatibility faults, RPC and Binary. Both use cases are equally valid and both are tested.

JDiff Public API Diff Tool

We created a tool for diffing the public APIs that developers use to write clients. It is designed to diff the public APIs of any two arbitrary git branches, assuming that either branch is based on HBase 0.92.x or 0.94.x. A modification for 0.95.x is in the works. The script is called jdiffHBaseFromGitRepos.sh and is located in the dev-support folder. It is committed to trunk: https://github.com/apache/hbase/blob/trunk/dev-support/jdiffHBaseFromGitRepos.sh.

Why is this useful? Anyone can run it and scan the report and see if their change introduced any incompatibilities. Ahead of releases, we examine this report to sanity-check our compatibility automation. Here are a couple screen shots:

Report home: Diff from cdh4.0.1 to the latest cdh4. The left-hand side shows all additions, removals, and changes. This is organized by package, class, method, and constructor levels. Note that this will work for both CDH bits and Apache bits.

Changes to a particular class: In this case, HBaseAdmin.java. Note that HBaseAdmin is a class that developers would use to write applications. Like most of the classes we are concerned about for compatibility, it is located in the org.apache.hadoop.hbase.client package.

Impact of Secure HBase

Security doubles our testing matrix, so Cloudera tests all three APIs on secure and unsecure clusters.

Adding security to client code isn’t much beyond configuration changes. For setting up a secure server, the process is much more involved.

Thankfully, the problem of automating secure setup has been solved nicely by Cloudera Manager. We leverage it heavily as part of our secure automation. In this case, we have long-running clusters against which we test different secure clients.

Conclusion

Compatibility is a very important guarantee for our customers and for the HBase community. We like to think that we have a solid understanding of the problem with the current testing infrastructure in place. However, there is more work to be done as we look for other forms of compatibility that developers value.

Examples of areas where additional API compatibility testing can be added:

Additional types of filters

Bulk load

Users and permissions

We also examine the cdh-user@ mailing list and community forums carefully for potential disruption to the developer experience, and to learn from past incompatibilities.

Aleksandr Shulman is a Software Engineer on the Platform team, working on HBase.

Show more