I recently started thinking again, this time about low-level reuse – yes, the utils library. Trivial to reuse, this is the layer you build on top of the standard library to make life a little easier in general. The
StringUtils.isNullOrEmpty() method, and other things that are just missing from the standard library.
Sounds simple! Well, that method, at least, should be straight-forward to reuse. But it occurred to me that I’ve coded these libraries a few times now. For the love of removing code – why? What, if anything, makes such a basic piece of code hard to reuse? Apart from IPR issues, I’ve identified some personality traits, if you will, that I find to be hurdles to reuse. While obviously not exhaustive, these three are:
- Dependency Addiction
- Lack of Inner Motivation
- Bad Language
Undesirable in any person, how do these traits appear in code? I’ll make the case right here that the general problem is connectedness in your code. Connections that go downwards, upwards and sideways. Read on:
This is the trivially identifiable hurdle: the downward connections. Say you’re on a project that could really use a good utils library – maybe with lots of low-level WET boilerplate. (Agile wiseguy insert: “Write Every Time”, so not DRY.) You have some utils that you want to reuse, but it turns out they depend on around half of the Jakarta Commons! Reuse is inhibited by various factors:
- Conflicting versions of those libraries are already used in the project, or
- the project has a stricter policy on third-party libraries, or
- the codebase is smugly designed to be lean and mean, meaning that your library represents a rather considerable addition to the project’s footprint.
Seeing the uneasy frowns descending on your new co-workers, how do you deal with it? Ruthless purges!
One purge tactic is to identify parts that consume dependencies for trivial purposes. Are you really only using a few of the classes from Commons Collections? Isn’t it sometimes worth re-inventing a wheel, if the custom-designed wheel is a lot slicker? Implement the required functionality yourself, and admire your newly-invented wheel. (Which is usually eminently testable, since you know what you need it for.) If it makes reuse work better, chances are that you can be absolved of the sin of wheel-reinventing. (And like many sins, wheel-inventing is fun, too.)
Second, some of the highly-dependent code might be persuaded to move elsewhere. Maybe it pulls in lots of dependencies because it provides a major, maybe un-utils-like, capability. Ask yourself if it is really a good candidate for this utils library. Find out how many usages the culprit has, and try to get a view of its actual utility. Consider being downright unfair on the hapless piece of code, for the greater good of reuse. It might be better off as its own little module.
In short, find the undesirable downward connections. Some come from the library, some go to the library. Identify and eliminate!
Lack of Inner Motivation
This one is slightly more interesting. I also think of it as origin artifacts, and again, undesirable connections. These are undesirable upwards connections.
Sometimes, we see an elegant piece of code that we want to reuse. So we weed out any references to the application code – the host application, i.e. the origin – and parametrize here and there. We get a general piece of functionality that we can move to the utils library. The original application code now simply calls the utility, with its specific parameters. It ends up leaner, more focused, and generally higher-level. You get the opportunity to clean up the logic chunk under consideration. Even when the utils library isn’t the right place for it, moving code out can be a worthwhile exercise in terms of quality. (It can also expose opportunities and trigger yet more radical code cleaning.)
But the utils may not be the right place. The problem arises when the utility isn’t really as general as it looks; maybe it embodies tacit assumptions, or handles special needs of the origin. They are the origin artifacts, the hidden upwards connections. This reveals its reduced utility in other applications, and it springs surprises on innocent re-users at awkward times. If it hangs around, it pollutes the utilness of your utils. Worst case: Similar utilities, with other quirks, make their own way into the same utils library. (Bad language – more on that below.)
The motivation for a utility must be clearly stated and obviously useful, in and of itself. The motivation must be intrinsic, not extrinsic. Can you describe the function without referring to the origin, or explaining a series of not-too-abstract-sounding preconditions that just happen to apply to the origin? Maybe it is not actually to be a general, reusable utility.
Or maybe it just needs more parameters. When investigating a possible util-impostor, you can fight to keep it, by documenting the quirks. One tactic is to identify the potentially surprising twists and turns, and expose them as options – i.e. more parameters! Parameters are at least good documentation points, and helps you expose the connections.
In any case, the default behavior should be left as nicely unsurprising as possible. It may still not be all that generally valuable over time, so you should keep eviction notices handy.
So far, this has been mostly a trite rehash of common wisdom. The most remotely interesting item is this one, which deals with the sideways connections, or lack thereof. Again, these are hidden and unspoken, but in contrast to the above, there are both undesirable and desirable connections. Lispers might recognize this part as the language-building philosophy of Lisp programming. Warning: this gets vague, we are not in HOWTOs anymore.
So here’s how we look at it: The utils library you’ve been hammering into shape actually represents, if not a new language, then at least an extension to the language, in the same way that the Java standard library also defines Java. Of course, technically, Java 5 is the mostly same language as Java 1.0.2 (and backwards compatible!), but for most practical purposes they’re very different, and the evolution of the standard library is the real difference. (If I didn’t make that point with you, consider Java 1.0.2 and 1.4.2 instead.)
Library design, at least for low-level reusable libraries, is to some extent also language design.
So what are the undesirable sideways connections? Inconsistency, plain and simple. Language design is hard, and challenges include internal consistency and uniformity: Keeping the implicit connections in mind and keeping them logical, keeping the conceptual disconnects out. For instance, if you have overlapping functionality (similar utilities with different quirks, for instance), that’s a disconnect. If there are related (or even overloaded) methods, and their argument ordering varies, that’s a disconnect too. What you get is a confusing mess of a language. What you really want is a practical (maybe even elegant), incremental improvement to your existing language.
And those are only the syntactic connections. The semantic connections are the ways that the various parts can be combined. A sizable utils library has many parts that can be combined in infinitely many ways; it is combinatorial. This is really powerful, actually too powerful for humans to handle, so it must restrict itself from providing connections that are undesirable. Why doesn’t
java.lang.String have an
openFile() method? Because it’s insane, that’s why. A String can obviously represent the name of a file, but it doesn’t go ahead and provide this connection. String isn’t the obvious place to look for file handling, because strings are more general than files, therefore we allow files to talk about strings (with e.g.
file.getName()), but not the other way around.
The good connections, on the other hand, know their place. Notice how the I/O libraries deal with e.g. InputStream, and not the multitude of things that can provide InputStreams. This level of indirection makes for an extra step when you wire things up (INSERT HERE: gripes from people touting the ‘concise’ syntax for that particular case in their favorite – though possibly leaky-runtimey – scripting language). But it also adds a degree of freedom, and more ways to combine the basic parts. To combine with the I/O libraries, you don’t have to be a File, you can just provide an InputStream.
The Java standard library is rather conservative with frivolous connections, which has probably been good for its longevity.
Good connections must obey some ordering of things by generality, usually in some tree-like structure. The connections should enable combinatorial composition, and avoid flooding the API with maybe-possibly-helpful methods. The ordering is subtle, tree-like and never quite explicit – but it is there – and it will become painfully clear (or at least painful) to users when things are out of order. When your connections aren’t good, things don’t combine well and boilerplate starts to gather like moss in unexpected corners. Or they don’t get used at all, because they’re hiding in the wrong place.
There’s no hard and fast way to fix this – over time, it is the hardest part of growing a good utils library. It simply takes a lot of single-minded whacking of things into shape, just like the Java standard library probably did. But on the whole, I find it useful to imagine myself as a language designer.
Back to Basics
If you read this far and find all this to be basic stuff – it is. It boils down to some basic lessions of design: cohesion and coupling. You want high cohesion (inner motivation, no overlaps, consistent library design) and low coupling (no origin artifacts, minimal dependencies, abstractions ordered by generality). And obviously, you want consistency too.
The best test of a good utils library is taking a break and returning to it. Does it feel natural? Do you know roughly where things are? Or do you find yourself adding more stuff to it, only to discover later that the functionality really was there – just not where you looked? That means it needs more work. But of course, something like this is never completed anyway.
Practicioners of Test Driven Development (TDD) have received some flak lately. You may have seen Jim Coplien talk about this – at JavaZone last year he used the opportunity to deploy somewhat of an all-out attack. I’m a TDD fan myself, but I found his criticism targeted at the practice of TDD – or rather the practice of TDD and nothing else – not at TDD itself. Or something.
The main argument against TDD seems to be that it doesn’t help you grow a coherent architecture. Test-driven code could very well turn out a messy collection of disparate, scattered code snippets, with no hint of an over-arching common idea, other than the fact that they are all “use cases” somehow. Therefore, even if you practice TDD, you need to grow the design as well. If that is the argument, then well, duh. The design has never appeared by itself, not before TDD and with it.
Oh, and before we proceed: For anyone with “architect” in their title and conscious about it, just subsitute “design” with “architecture” in the above and below. OK? Proceeding…
But what can you with TDD if the design is already there?
Some context: My project recently started the work of porting a huge Java code base to C#. I was deep into the Java infrastructure layer, built on OSGI. I could go into it, but I’ll save that for later posts. The point is simple and it is this: The design was there, and it was in fact documented to some extent, but many details were in my head.
It was also a fact that I was the one with the most insight into this design, and that I was to leave the project only a few months into the porting process. Everyone agreed that we should mostly reproduce the infrastructure, to facilitate the porting of the actual business components – the POJOs, now to be PONOs – right? In short: picking my brain was high on everybody’s list.
After having everybody reading up on .NET for a while – it was new to everyone – I got impatient and jotted down a quick, OSGi-like core in C#. Again, I’ll leave out the details, but it turns out to be not so much work to write a simple, bootstrap OSGi-ish thing, to get us started.
Phrasing this as a methodologist would if there were such a thing as Vertical TDD: Then I put on my Test Driver hat, and launched the Vertical TDD. As you might guess, being Test Driver means you only do the first step of TDD: Write tests. After writing a test, instead of moving horizontally into the “make it pass” step, you drop down, vertically, and write another test. Point your co-workers to the failing tests and have them make them pass. This leaves the next two vertical columns in their hands: Make pass, refactor to remove duplication.
This seems to have worked well because:
- The tests had a clear function. They were a way to trace the contours of the design as I knew it, concisely listing the most essential features and behaviors that I knew we would need.
- They compelled my colleagues to re-create my design to some extent, and thus create a much deeper understanding of it. This is a lot more effective than sitting through presentations or reading design documents.
- The tests were an excellent checklist for tracking progress – each test would describe a certain feature, or a failure mode that we were able to handle. (Like non-compliant components.) Everyone could see green checkmarks flourish as progress was made.
And so on. All the usual arguments for having tests apply, but the main benefit here was the knowledge transfer. Not by reading about it or being monologued to about stuff, but by actually reconstructing it. Once the framework as starting to manifest in everybody’s heads, it became easier to outline remaining features on a higher level. People were developing the mental “hooks” required to see where everything fit in. In contrast, reading high-level documentation at an early stage, before any hands-on experience, will tend to produce whooshing sounds over your head.
You may find that making tests pass is considered more fun than refactoring out duplication. To counter this, I read up on the code and wrote more tests that exposed bugs of two kinds: One where one duplicate has it but not the other, and of course the one where the bug has to be fixed twice. Both drive the point home: Duplication is bad.
So what’s the lesson? It could be this, for instance: If you have a clear idea about a design in your mind, you can drive it through by providing the groundwork for it, and using TDD to help your co-workers fill it out. Of course, this is not a case for Big Upfront Design; you need a working, well-understood, accepted design. I may be using a slightly too strong definition of “clear idea” here, since it means “having worked with it for a couple of years”.
Anyway, I think this was a good experience, and it may even have been encouraging to anyone who’s been put down over their dangerously-veering-on-unfashionable TDD recently.