and boy were they scaryBe aware: This is a post about checked exceptions. (And of questionable coherency, as well.)

In my experience – which I eventually, painstakingly, finally gathered – checked exceptions aren’t pulling their weight. In fact, they are downright harmful. But explaining why is really hard, or so it seems. It must be, since so many have tried and still it hasn’t sunk in.

Everyone has been through all the basics (declare or re-throw, and so on) so I will assume intimate familiarity with Java exceptions and go right to the code. I want to draw attention to Java’s InputStream, which defines a set of methods with well-documented contracts that subclasses and clients should adhere to. All in all, this is a decent object-oriented design.

Here is some code I found myself writing the other day:

private InputStream stream(String... args) {
  Properties props = prop(args);
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  try {
    props.store(baos, "Foobar");
  } catch (IOException e) {
    throw new RuntimeException("You're kidding me", e);
  }
  try {
    baos.flush();
  } catch (IOException e) {
    throw new RuntimeException
      ("OK, the computers are rebelling. Break out the canned " +
       "food, run for the hills", e);
  }
  return new ByteArrayInputStream(baos.toByteArray());
}

This method takes a sequence of strings and returns a corresponding stream, which will behave as if it were a properties file in the classpath.

Assuming I usually provide serious exception messages: Why the less-than-serious exception messages here? First of all, this is test code, so I’ve taken liberties. Second: As everyone can see, these exceptions won’t happen! In-memory arrays, which are in play here, aren’t subject to network conditions, slow I/O, packet loss and and other famous fallacies. If they do give you problems, you’re probably running on a seriously broken machine (or the AIs are turning on us). A silly test failing is probably not your priority.

Obvious conclusion: Catching exceptions here is silly, but we have to do it because they’re checked. ByteArrayOutputStream inherits all the methods from InputStream and they inherit their throwses as well. One remedy is that Java allows you to override methods and trim the throws list for things that don’t make sense to you. This is done by write in ByteArrayOutputStream, for instance, which omits theIOException.

However, clients that want to use the throws-free subclass signature will need a reference typed to the subclass, since the write method defined in InputStream still has the throws declaration. This breaks with the idea of polymorphism, and it bites us when we pass the stream to the store method, which quite rightly takes the obligatory InputStream. Only the caller knows the exception is bogus in this case.

Back in the method, next we call flush. ByteArrayOutputStream inherits this method implementation from its superclass, with throws lists intact. In this case, the default implementation is empty. It is thus redundant to call it, but it’s also common practice because it’s part of the general stream contract. All in all, it’s responsible client behavior to follow the contract, as the implementation might change but the contract will not. The end result is silly on in many ways: A dutiful client invoking an inert object, handling the fictitious exceptions because the subclass didn’t put in this silly override:

@Override
public void flush() {
// Override just to get rid of IOException - bonkers
}

I like to think that someone in Sun once wrote this method, and then erased it in deep denial of the silliness of it all. Anyway, as we’ve already been over: This wouldn’t have helped much once the object got passed around to other methods, typed to InputStream.

The last line is, so the speak, the exception that proves the silliness of the rules: For getting the byte array out of the stream, you call a method defined in the subclass. Naturally, it doesn’t throw any exceptions. It. Will. Just. Work. Just like all the other interaction with this instance of InputStream. Will. Just. Work.

This instance. It works, and I know it. And I think that is the real issue.

Programming is a conversation between the creative, human programmer and a rigid, formal system. The question for the programmer is: What rigid, formal system am I talking to? Am I working with the runtime instance of InputStream, or am I working with the statically defined interface InputStream? Am I in a conversation with the runtime, or the compiler?

Having worked with interactive systems like Smalltalk, my mindset is that talking to the runtime is just as important as talking to the compiler. Programming, by its nature, is to dictate ahead of time what should happen in any given instance of your program. However, you should remember that in the future, there will be more information available. In particular, there is all the information that you have gone out of your way to hide from yourself. At the programming stage, you should hide it. It’s good, sensible, object-oriented design to hide information, encapsulate knowledge and say “this class should know this but not that” and “only this class should know this”. It’s not a good idea to hide runtime knowledge that the objects will have as a result.

In order to evolve and refine your code, you need that feedback from the runtime. At programming time, there are references of type InputStream. But at runtime, there are no instances of InputStream as such. They are all objects of a more specific type, and that is what you need to talk about, to get your program in shape.

In this case, checked exceptions get in my way so I can’t say “this method knows that this is a safe, in-memory operation”. I can’t tell that to the Properties instance, so it has to assume the worst and project that back on me: OK, I’ll take your little input stream – but I know those things aren’t generally trustworthy, so you have to handle this exception.

Checked exceptions, to me, are firmly in talk-to-the-compiler country. I think it’s hurting people’s ability to talk to the runtime, and thus it is harmful. People seem to spend so much of their time typing those throws and catches. Here is some code that can result:

try {
  Query query = readFromStream(stream, encoding)
  return process(query, settings);
} catch (FooException e) {
  throw new MyException("There was a foo problem", e)
} catch(BarException e) {
  throw new MyException("Bar failure detected!", e)
} catch(ZotException e) {
  throw new MyException("Zot exception", e)
}

If you talk mostly to the compiler, this can be quite sensible code. (It even preserves the cause, which is more than I can say for certain open source frameworks.)  You catch what you have to catch, then you do the rethrows to comply with your own throws. My suggestion would be:

Query query;
try {
  Query query = readFromStream(stream, encoding);
} catch (Exception e) {
  throw new MyException(this + " failed to read query from " +
     stream + " with encoding " + encoding, e);
} // the above could be a method, really
try {
  return process(query, settings);
} catch (Exception e) {
  throw new MyException(this + " failed to process " + query +
    " with settings " + settings + ", query source: " +
    stream, e);
}

The main difference is that information from the execution context is preserved, instead of the code just repeating information that is available from looking at the code. All exceptions are handled the same, avoiding duplicaton of the code that builds the message and encouraging us to refine this code further. We don’t need to mention any of the Foo, Bar and Zot trio of error cases explicitly, because that will be apparent from the wrapped exception. When I look at the stack trace later, the fact that the compiler and the programmer knew about the exception type beforehand is of limited utility. What I need to know is what the code was actually trying to do, and what went wrong – whatever it was.

It could be that the code was trying to do something completely out of whack. Is this what programmers are afraid to reveal? Is it more comforting to say “oh, I definitely know that this exception might occur – it’s declared!” and handle the error in a way that lets the world know they really understand how this checked exception thing works? Giving the future and the runtime their say in it may be scary – like losing control. Yet it is when things are completely out of whack that we need to know what those things are.

By the way, I assume sensible toString()s that get invoked when this and query are included in the message. The stream isn’t likely to have a meaningful string representation, unfortunately, but we should act as if it someday will.

Some might balk at the catch-all of Exception. I consider in an improvement here, since it will provide more information with less maintenance. It also tells us something slightly different about the code. Specific catches makes me suspect that the programmer has been forced to type out the different cases by the compiler, and really has no idea what they mean, outside of what their Javadocs say. Having a single catch-all clause, however, can be interpreted as a sign of life. It tells me that the programmer is aware that this spot is an important cog wheel in the program flow. It is important to preserve information about the failure if any exception travels through this spot, especially if it was an exception we didn’t expect – an actual exception.

Of course, it’s OK to add specific catches if we know their significance. We might know that this exception definitely indicates that error condition, and we want to act on that. However, this knowledge has to be maintained. A generic catch-all with sufficient information capture could serve you equally well, at least until your runtime world has stabilized. Until then, error handling is a moving target and the priority should be to get a feedback loop working, to help you gradually understand your code base and runtime fully.

This applies to log messages as well – there is a world of difference between:

public void doStuff(String stuff) {
  log.info("Now doing stuff")

and

public void doStuff(String stuff) {
  log.info(this + " now doing " + stuff)

Again, remember to put in a good toString() to reveal essential instance state.

If you talk only to the compiler, you will tend to type in what you knew when you typed it in, ie.: “Yeah, we expect Foo, and Bar, and Zot to be the unexpected outcomes here – the compiler told me that”. If you talk to the the runtime also, it is natural to just ask it to provide any and all information relevant to future error analysis. I think much code would be improved if more programmers were on talking terms with both the compiler and the runtime. And if they talked to the runtime more, they would realize that the runtime is where all exceptions come from – from the future! All exceptions are really runtime exceptions.

I recently started thinking again, this time about low-level reuse – yes, the utils library. Trivial to reuse, this is the layer you build on top of the standard library to make life a little easier in general. The StringUtils.isNullOrEmpty() method, and other things that are just missing from the standard library.

Sounds simple! Well, that method, at least, should be straight-forward to reuse. But it occurred to me that I’ve coded these libraries a few times now. For the love of removing code – why? What, if anything, makes such a basic piece of code hard to reuse? Apart from IPR issues, I’ve identified some personality traits, if you will, that I find to be hurdles to reuse. While obviously not exhaustive, these three are:

  • Dependency Addiction
  • Lack of Inner Motivation
  • Bad Language

Undesirable in any person, how do these traits appear in code? I’ll make the case right here that the general problem is connectedness in your code. Connections that go downwards, upwards and sideways. Read on:

Dependency Addiction

This is the trivially identifiable hurdle: the downward connections. Say you’re on a project that could really use a good utils library – maybe with lots of low-level WET boilerplate. (Agile wiseguy insert: “Write Every Time”, so not DRY.) You have some utils that you want to reuse, but it turns out they depend on around half of the Jakarta Commons! Reuse is inhibited by various factors:

  • Conflicting versions of those libraries are already used in the project, or
  • the project has a stricter policy on third-party libraries, or
  • the codebase is smugly designed to be lean and mean, meaning that your library represents a rather considerable addition to the project’s footprint.

Seeing the uneasy frowns descending on your new co-workers, how do you deal with it? Ruthless purges!

One purge tactic is to identify parts that consume dependencies for trivial purposes. Are you really only using a few of the classes from Commons Collections? Isn’t it sometimes worth re-inventing a wheel, if the custom-designed wheel is a lot slicker? Implement the required functionality yourself, and admire your newly-invented wheel. (Which is usually eminently testable, since you know what you need it for.) If it makes reuse work better, chances are that you can be absolved of the sin of wheel-reinventing. (And like many sins, wheel-inventing is fun, too.)

Second, some of the highly-dependent code might be persuaded to move elsewhere. Maybe it pulls in lots of dependencies because it provides a major, maybe un-utils-like, capability. Ask yourself if it is really a good candidate for this utils library. Find out how many usages the culprit has, and try to get a view of its actual utility. Consider being downright unfair on the hapless piece of code, for the greater good of reuse. It might be better off as its own little module.

In short, find the undesirable downward connections. Some come from the library, some go to the library. Identify and eliminate!

Lack of Inner Motivation

This one is slightly more interesting. I also think of it as origin artifacts, and again, undesirable connections. These are undesirable upwards connections.

Sometimes, we see an elegant piece of code that we want to reuse. So we weed out any references to the application code – the host application, i.e. the origin – and parametrize here and there. We get a general piece of functionality that we can move to the utils library. The original application code now simply calls the utility, with its specific parameters. It ends up leaner, more focused, and generally higher-level. You get the opportunity to clean up the logic chunk under consideration. Even when the utils library isn’t the right place for it, moving code out can be a worthwhile exercise in terms of quality. (It can also expose opportunities and trigger yet more radical code cleaning.)

But the utils may not be the right place. The problem arises when the utility isn’t really as general as it looks; maybe it embodies tacit assumptions, or handles special needs of the origin. They are the origin artifacts, the hidden upwards connections. This reveals its reduced utility in other applications, and it springs surprises on innocent re-users at awkward times. If it hangs around, it pollutes the utilness of your utils. Worst case: Similar utilities, with other quirks, make their own way into the same utils library. (Bad language – more on that below.)

The  motivation for a utility must be clearly stated and obviously useful, in and of itself. The motivation must be intrinsic, not extrinsic. Can you describe the function without referring to the origin, or explaining a series of not-too-abstract-sounding preconditions that just happen to apply to the origin? Maybe it is not actually to be a general, reusable utility.

Or maybe it just needs more parameters. When investigating a possible util-impostor, you can fight to keep it, by documenting the quirks. One tactic is to identify the potentially surprising twists and turns, and expose them as options – i.e. more parameters! Parameters are at least good documentation points, and helps you expose the connections.

In any case, the default behavior should be left as nicely unsurprising as possible. It may still not be all that generally valuable over time, so you should keep eviction notices handy.

Bad Language

So far, this has been mostly a trite rehash of common wisdom. The most remotely interesting item is this one, which deals with the sideways connections, or lack thereof. Again, these are hidden and unspoken, but in contrast to the above, there are both undesirable and desirable connections. Lispers might recognize this part as the language-building philosophy of Lisp programming. Warning: this gets vague, we are not in HOWTOs anymore.

So here’s how we look at it: The utils library you’ve been hammering into shape actually represents, if not a new language, then at least an extension to the language, in the same way that the Java standard library also defines Java. Of course, technically, Java 5 is the mostly same language as Java 1.0.2 (and backwards compatible!), but for most practical purposes they’re very different, and the evolution of the standard library is the real difference. (If I didn’t make that point with you, consider Java 1.0.2 and 1.4.2 instead.)

Library design, at least for low-level reusable libraries, is to some extent also language design.

So what are the undesirable sideways connections? Inconsistency, plain and simple. Language design is hard, and challenges include internal consistency and uniformity: Keeping the implicit connections in mind and keeping them logical, keeping the conceptual disconnects out. For instance, if you have overlapping functionality (similar utilities with different quirks, for instance), that’s a disconnect. If there are related (or even overloaded) methods, and their argument ordering varies, that’s a disconnect too. What you get is a confusing mess of a language. What you really want is a practical (maybe even elegant), incremental improvement to your existing language.

And those are only the syntactic connections. The semantic connections are the ways that the various parts can be combined. A sizable utils library has many parts that can be combined in infinitely many ways; it is combinatorial. This is really powerful, actually too powerful for humans to handle, so it must restrict itself from providing connections that are undesirable. Why doesn’t java.lang.String have an openFile() method? Because it’s insane, that’s why. A String can obviously represent the name of a file, but it doesn’t go ahead and provide this connection. String isn’t the obvious place to look for file handling, because strings are more general than files, therefore we allow files to talk about strings (with e.g. file.getName()), but not the other way around.

The good connections, on the other hand, know their place. Notice how the I/O libraries deal with e.g. InputStream, and not the multitude of things that can provide InputStreams. This level of indirection makes for an extra step when you wire things up (INSERT HERE: gripes from people touting the ‘concise’ syntax for that particular case in their favorite – though possibly leaky-runtimey – scripting language). But it also adds a degree of freedom, and more ways to combine the basic parts. To combine with the I/O libraries, you don’t have to be a File, you can just provide an InputStream.

The Java standard library is rather conservative with frivolous connections, which has probably been good for its longevity.

Good connections must obey some ordering of things by generality, usually in some tree-like structure. The connections should enable combinatorial composition, and avoid flooding the API with maybe-possibly-helpful methods. The ordering is subtle, tree-like and never quite explicit – but it is there – and it will become painfully clear (or at least painful) to users when things are out of order. When your connections aren’t good, things don’t combine well and boilerplate starts to gather like moss in unexpected corners. Or they don’t get used at all, because they’re hiding in the wrong place.

There’s no hard and fast way to fix this – over time, it is the hardest part of growing a good utils library. It simply takes a lot of single-minded whacking of things into shape, just like the Java standard library probably did. But on the whole, I find it useful to imagine myself as a language designer.

Back to Basics

If you read this far and find all this to be basic stuff – it is. It boils down to some basic lessions of design: cohesion and coupling. You want high cohesion (inner motivation, no overlaps, consistent library design) and low coupling (no origin artifacts, minimal dependencies, abstractions ordered by generality). And obviously, you want consistency too.

The best test of a good utils library is taking a break and returning to it. Does it feel natural? Do you know roughly where things are? Or do you find yourself adding more stuff to it, only to discover later that the functionality really was there – just not where you looked? That means it needs more work. But of course, something like this is never completed anyway.