"I'd advise you to run your process with a regular rhythm of
releases,
each mapped to a set of use cases and each representing a successive
refinement of the system's architecture." -Grady Booch
"Refactoring is the process of taking a running program and adding
to its value, not by changing its behavior but by giving it more of those
qualities that enable us to continue developing at speed." - Kent
Beck
This subject of this tip is refactoring. The overall theme can be
expressed as follows: Programmers need to constantly refactor their
code in order to make their logic easier to understand and maintain.
In order to do this safely, they need to cover their classes with unit
tests.
If you prefer seeing these ideas laid out as bullet points, then here
is an alternative presentation. Programmers refactor their code to make
it:
Easier to understand.
Easier to maintain
Easier to test.
What is Refactoring?
For the purposes of this article, I will define refactoring as follows:
It is the process of renaming or restructuring classes, methods and
variables.
If you draw an analogy between programming and writing, then
refactoring your code can be thought of as a process similar to editing or
rewriting a piece of text. If you go through an existing written work and
focus on clarity of presentation, on finding exactly the right way of
expressing an idea, and on adding clarity by breaking ideas out into new
paragraphs, new sub-headings, and new chapters, then you are engaged in a
process similar to refactoring.
Another interesting definition of refactoring is offered by Martin
Fowler in his book "Refactoring," from Addison Wesley. Fowler writes:
Refactor: (verb): to restructure software by applying a series of
refactorings without changing its observable behavior. This
definition is less than ideal in that it is self referential: it uses the
word refactorings to define the verb refactor. Nevertheless, it brings
out a key point: refactoring is not about adding new features, it is about
refining existing features. This theme will appear several times
in this article.
A few examples might help. If you have a variable called MyList,
and you clarified its purpose by renaming it MyListOfPlanets or
MyPlanetList, then you have refactored your code.
A more interesting example of refactoring might involve taking a single
class and breaking it out into two classes. Such an action might enhance
the clarity of your code or its re-usability. For instance, you might
take a class called MyFilterList and break it out into two classes
called MyPlanetTextFilter and MyPlanetList.
Here are three specific advantages you gain by breaking
MyFilterList out into two classes called MyPanetTextFilter
and MyPlanetList:
- Names such as MyPlanetTextFilter and MyPlanetList
are easier to understand than MyFilterList. Take a moment to think
about this issue and you can see why this is true. If you hear the term
MyFilterList you might ask what is being filtered and what is being
listed. To discover the answer to these questions, you would need to read
the code in the class. In short, you would have to become a human
compiler and start parsing code in order to understand its purpose. A
word like MyPlanetList, on the other hand, explains up front that
the purpose of the class is to maintain a list of planets. It
makes your code easier to understand.
- It is easier to maintain two small classes called
MyPlanetList and MyPanetTextFilter than it is to maintain a
single big class called MyFilterList. Once again, it is easy to
understand why this is the case. If you encounter a bug in
MyFilterList, the first question you would have to ask yourself is
whether the bug is in the list of planets, or in the filtering of the
lists of planets. You have to be sure that your fix to one part of the
class does not break the other part. If, on the other hand, you are
working with a class called MyPlanetList, and find a bug in it,
then you need concern yourself only with a single problem domain: the act
of maintaining a list of planets. In short, refactoring your code
into two smaller classes helped make your code easier to debug and
maintain.
- Finally, having broken MyFilterList out into two classes
makes your code easier to test. If you are writing tests for
MyFilterList, then you have to compose two types of test, one to
test the status of the list of planets, and one to test the filtering of
the planets. It is obviously easier to write a test for a class like
MyPlanetList, since the purpose of the class is so easy to define
and understand. Furthermore, you can be sure that you are testing only
bugs related to the planet list itself, and not accidentally using your
test to uncover bugs in the filtering process.
To keep my presentation simple, I have come up with a simple example.
There is, however, a second reason why this example may seem
exceedingly obvious to you. We refactor our code in order to make it
easy to understand. As a result, well refactored code should have an
obvious, intuitive, perhaps even trivial feeling to it.
Who is Interested in Refactoring?
Not all programmers will be interested in refactoring. To better
understand why this is the case, you need to consider three possible
schools of thought about designing a program:
- One school suggests completely planning out your program ahead
of time, defining all your program's functionality and classes. This
design document then becomes an immutable guide which must be followed to
the letter, regardless of consequences.
- Another school of thought advocates developing your code incrementally
via an iterative process. Start out a with a few basic design goals, then
sit down and implement them. Now test your code, ask for feed back, and
redesign your code to include any improvements that emerged from the
testing and feedback sessions. Continue this process until you have
developed an application that passes all your tests and fulfills the
practical suggestions you got during feedback sessions.
- Combine the two methods by starting out with a moderately specific
plan, and then enhance that plan by an iterative process of incremental
improvements as outlined in the previous bullet point.
If you are an advocate of the first school of programming, then
refactoring is not going to be important to you. After all, if everything
was set in stone from the beginning, then why would you ever need to
restructure your classes or rename your variables? However, if you share
my advocacy of the latter schools of thought, then you need to think about
refactoring your code.
Another group of programmers who will not be interested in refactoring
are those who continually want to add yet one more feature to a program.
If you are this type of programmer, then the whole concept of refactoring
your code, and of writing unit tests, will sound boring, or worse, like a
waste of time.
I would add that from my point of view, refactoring and unit testing is
both practical and intellectually satisfying. When I can work at my own
pace, I find programming to be more interesting than playing chess,
reading a novel, watching a movie or playing a computer game. What
specifically is it about programming that I find so interesting? To me,
the most exciting part of writing code is finding the right structure for
my program. Not quite as enthralling but still interesting, is the act of
writing tests to prove that my architecture is sound.
More obliquely, but perhaps more tellingly, Kent Back presents us with
the following jewel of wisdom: "If you can get today's work done today,
but you do it in such a way that you can't possibly get tomorrow's work
done tomorrow, then you lose." Kent Beck is one of the founders of
the school of programming discussed in this tip. His goal is to make
sure that programmers start winning, and stop losing.
The Primary Benefit of Refactoring
Let's think about refactoring your code from a slightly different
perspective. From this new point of view, the main purpose of
refactoring is to encapsulate code inside increasing levels of
abstraction. That can sound like a complex process on first hearing.
However, it is meant to promote not complexity, but simplicity.
When we work at increased levels of abstraction than we can think
about complex ideas in simpler terms. Consider the following two ways
of describing an object:
- This object is made from trees that were cut down, ground up
into pulp, and then mashed together into thin white sheets. On these
sheets of wood pulp, an ink prepared from refined petroleum and
plant products is stamped on pages in patterns which are meaningful
to trained carbon based entities who have a sophisticated cerebral
cortex and a refined visual ability.
- This object is a book.
The first description is more specific and detailed, the second is
more abstract. However, they are both ways of talking about the same
object.
When it comes to reusing the same idea, most people would
prefer the second explication, rather than the first. The same is
true in programming, we always have the option of writing out a
series of complicated steps over and over again. We can, however,
simplify the process by working at a higher level of abstraction. This
usually means we encapsulate a series of steps inside an object.
We are able to use the word book in conversation because we can be
sure that most listeners have a good and valid understanding of the
term. In programming, we can reuse an object easily if it is well
structured and well defined. One of the primary goals of refactoring
is finding that well structured and well defined presentation for an
object.
What is Wrong with Refactoring and Unit Testing?
There are no magic bullets in programming. The subject is hard
no matter what tools you use. Give me a moment to set up my argument,
and I will try to explain what can go wrong with this technology.
When refactoring code, we want to move increasingly toward simplified
levels of abstraction. We want to take complex operations and encapsulate
them inside a set of objects that are easy to understand and test.
Consider the following analogy. If we are trying to create a computer
program that simulates a library, we might first start with a single
object called Library. Then we might see that our Library
consisted of several shelves of books. Rather than incorporate shelves as
part of the Library object, we might instead create a Shelf
object and a ShelfList object. Then we might notice that a
Shelf contained many books. Again, we see the need to break out
the concept of a book into a separate Book object. And so on, as
you discover the chapters inside the book, and the paragraphs inside the
chapters, etc.
All object oriented programmers do some of this. Advocates of Agile or
XP programming in general, and of refactoring and unit testing in
particular, take this idea to an extreme. In the end, they have a lot of
small classes. They might even seem to be adding to the level of
complexity in their program. Instead of one or two objects, they now have
many small objects.
People who don't like unit testing, and refactoring, always eventually
come around to this point as the core of their criticism of the whole
methodology. They will say, "Look, ultimately you are left with all these
small objects and seeing how they fit together is not easy." No one is
trying to deny this fact. Programming is difficult, and refactoring and
unit testing is not a magic bullet that will suddenly make it simple. It
still takes time to come to understand a well refactored program. The
point, however, is that refactoring is an effective way to discover a very
good structure for your program. It is, we advocates of the technique
believe, a better way to find the right structure than you can achieve by
planning everything out in detail ahead of time. Yes, you still need a
document that describes the structure of your program so that newcomers
can see how it is put together. And you still need to do some planning
ahead of time. However, if you restructured properly and carefully then
your architecture should fit together neatly in a logical and cleanly
thought out manner. It will be easy to test and easy to maintain. The
point is that restructuring helps you find and refine your architecture,
and unit testing helps you prove that your architecture is valid.
Extreme Refactoring
The great advantage of frequent unit testing and refactoring your
code is that it helps you structure a process that is otherwise
amorphous. When I think that I can deliver a project in one month
when it really takes me two, I am actually quite correct in my
original assumption. I will in fact spend about one month of that two
months actually planning and writing my code. The other month will be
spent trying to make sure it works right, and making sure I know enough
about its structure to be able to add features and debug the code.
The great thing about unit testing and refactoring is that it takes
that second, missing month of development, and gives it a definable
structure and purpose. It doesn't make it go away, but it gives that
period structure.
If you properly unit test and refactor your code, then you always
know that your code works, and you always know its structure and how
to test and amend that structure. Unit testing and refactoring gives
definition to the amorphous, unstructured, portion of code
development.
Open Source Magic
If you are used to working in shops that think only in terms of major
releases, it can be very confusing to watch the development of a certain
type of modern open source project. Sometimes I will hear a lot about a
famous project, and go to SourceForge to download it. To my
consternation, I discover that this project is at version 0.214. Seeing
that version number, I might think the project is completely useless, and
not worth downloading. But if I do download it, I might be surprised to
find that nearly all the features found in the product are functioning
properly. What is going on with this project? What does it mean to say
that such a well developed project is only at version 0.214?
The answer here is simple. The developers are following the basic
principle of releasing early and often. They have covered their project
with unit tests, and know that at any one stage in its development, the
whole program is working correctly and in a fairly robust manner. They
won't reach version 1.0 until they have added many more features, but that
fact has nothing to do with whether or not the program is robust. By
using the principles outlined in this tip, and in previous tips, and in
other programming tips yet unwritten, these people have discovered a means
of creating robust software that is useful from a very early stage.
There are two great benefits derived from this technique:
- The developers can get a following who will give them good bug reports
and good product ideas even during earlier stages of development. The
product works right away, so testers begin using it even in early stages
of development. This more or less precludes the possibility of the
development team every reaching 1.0 with some huge unfound bug lurking in
their code. Big commercial development teams who think in terms of major
releases, on the other hand, risk this problem every time they ship. As a
rule, those shops produce major version numbers which are inherently
buggy, and it is only their point releases that get thoroughly tested and
cleaned up.
- The second benefit of releasing early and often is that bugs get fixed
not in a matter of not months or years, but in a matter of days, or
sometimes hours. Though it does not happen often, there are probably a
number of readers who have had the experience of reporting a serious bug
to a big open source project, only to find that the developers fixed the
bug and posted the update in a matter of hours. Though the particular bug
I was reporting was an exceedingly simple one to fix, nevertheless this
happened to me just the other day at www.plone.org. How can something like
this occur in a world where bug fixes usually take months or years to be
implemented? For readers of this article, the answer should be obvious.
Fixing the bug is easy because the code is well refactored and easy to
understand. Testing that the fix did not break other code is easy because
the project is already covered with unit tests. (You have to write unit
tests not after fixing each bug, but as you add each new feature.) After
running the tests, it is just a matter of performing a build and
publishing the result.
Just to avoid unnecessary disagreements, let me make it clear that I do
not mean to be arguing against commercial software. Instead, I am arguing
in favor of a particular development technique. There are many commercial
shops that use exactly this technique. I bring up open source projects
only because they allow us to see into the nature of the development
process more easily than we can see into the development process at most
commercial shops. I repeat, this technique applies equally to open source
and commercial projects.
Summary
Earlier in this tip I outlined three schools of thought about
programming. The first school of thought advocated thoroughly planning
out your code ahead of time, and then sticking with that plan. The other
two plans advocated some variation of an iterative development process
involving feedback and testing.
A certain kind of person might go for option one on the grounds that it
seems the most rigorous, the most disciplined, of the three choices.
However, if you have managed to read through this entire programming tip,
you must now be able to see that a combination of intense unit testing and
intense refactoring is, if anything, potentially more difficult and more
time consuming than the first alternative. I am not advocating this
technique because I think it will make your programming cycle shorter. It
will improve the odds of your success, and help prevent wasting time in
the later stages of development, but it is not meant to be seen as a
shortcut.
An extreme approach to unit testing and refactoring does, however, have
one tremendous advantage over the first technique: It is much more fun.
One of the mantras of Extreme Programming is that developers like to write
tests. I have found this to be true. I find it very difficult to get the
discipline or insight necessary to thoroughly plan out a project ahead of
time. However, I almost always enjoy sitting down and writing and
refining my tests, and there are few intellectual pursuits more rewarding
for me personally than refactoring my code.
Software development has been compared to the art of herding cats. The
point of this phrase is that there is something intangible, something more
intuitive than scientific about the development process. If you can't
define something in rigorous step by step detail, then your ability to
do it well really becomes a factor of how much passion you are willing to
bring to a task. One of the great advantages of unit testing and
refactoring as a way of life for a programmer is that it helps engage
developers in their work. It's fun. It is an endeavor one can
passionately pursue on a day to day basis.
Good programmers are, in the best sense of the word, technophiles.
They love technology. For reasons that we can't quite explain, we love
working with complex, intricate tools such as compilers, debuggers and
IDE's. The tools we use to perform unit testing and to restructure our
code are a technophile's dream. It is simply fun to pop up JBuilder and
use its well designed refactoring tools. There is something fascinating
and pleasing about writing and running a well designed test.
The act of writing a test provides an incremental goal on the way to
our bigger goal. Huge projects can become long and dreary tasks. But you
can spice up that task by writing lots of small tests. Then, at the end
of each day, you have something tangible, something cool, that you can
look back on as an accomplishment. At the end of the day, you are not
just 0.3 percent closer to finishing your project. Instead, you are the
author of three completed unit tests that run and work perfectly, and you
restructured your code to make it better.
The point here is that unit testing and refactoring are a difficult and
sometimes painstaking way of achieving a goal, but they are also a fun and
engaging way of achieving that goal. Yes they take time and work, but
they also have their own, sometimes partially intangible, rewards.
Connect with Us