Sunday, May 30, 2010

ToDo or Toodledo. That is the Question. Again.

One of my most popular posts at my other blog is a comparison of ToDo and ToodleDo for the iPhone.  The original post was written a while ago, and both apps have had several significant revisions since then.  So, I'm refreshing the post here: I've gone over the presentation, update the information to reflect the current versions of these two apps, and tweaked the data to reflect my most current thinking.



I like PDAs because they help me manage the things I have to do – and I’m all about the todo lists.   I don’t know if I’ve become dependent on lists because I have a bad memory, or if my memory is failing because I use lists for everything.  Still, there it is.

Over the past year or so, a number of task manager apps have come out for my beloved iPhone, and I’ve been trying most of them.   It’s surprising how I keep coming back to the same two apps, and equally surprising (to me) that after months of playing around with them, I still can’t quite decide which one I prefer.

The two apps is Appigo’s ToDo, and ToodleDo for the iPhone.  Both cost only a few dollars, and both are very well-rated by the public at large.

So, I figured, let's use some design analysis tools to evaluate the two apps, and see what the numbers say.

I’m going to use two tools: pairwise comparison, and a weighted decision matrix. These tools aren’t only useful for analyzing designs – they’re basic decision-making tools, and they’ve always done right by me to evaluate designs, conceptual or otherwise.

Both tools depend on having a good set of criteria against which the two apps will be compared. You might not know what decision to make, but you need to know how you’ll know you’ve made the right one.  In our case here: How do I know when I’ve found a good task manager app?

The formal term for what I’m doing here is qualitative, multi-criterion decision-making. It generally comes involves four tasks, which in my case are:
  1. Figure out criteria that apply to any “best” task manager.
  2. Rank the criteria by importance, because some criteria will affect my decision more than others.
  3. Develop a rating scale to rate each app.
  4. Rate the apps with the rating scale and the weights.
Here’s my criteria, in no particular order of importance, based on years of using other task management tools:
  • Fast.  No long delays when telling the app to do something.
  • Easy.  Minimal clicking (e.g. not having to hit “accept” or "save" for everything, or burrow into deeply nested forms and subforms).
  • Start dates.  Tasks shouldn't appear on any standard task list until its start date (if given).
  • Due dates.  Obviously, but not mandatory on all tasks.
  • Repeats.  Repeating tasks at regular intervals.
  • Priorities.  At least three levels of priority for tasks.
  • Sync.  Easy syncing to some remote service that is fairly robust, using standard formats, that let's me access my tasks from other devices.
  • Groups.  Group tasks by tag or folder or project or whatever.
  • Sorting.  Multiple ways to sort tasks.
  • Hotlist.  Some overview page showing only near-term, important tasks; preferably customizable in terms of how I define "important."
  • Restart.  Picks up next time I run it where I left off last time (oddly, not every iPhone app does this).
  • Recovery.  Be able to uncheck tasks that were accidentally checked off.
  • Subtasks. Treat a single task as if it were a group/project/folder.
  • Checklists. A degenerate case of a task is just an item in a checklist.  Not every "task" really deserves all the attributes.  Checklists that can be used as templates (i.e. copied over and over again) would be even better.
  • Conditional deadlines.  Due dates based on due dates of other items (e.g. task B is due two weeks after task A is completed).
  • Backlinks. Given a task, one-tap access to the group/project/folder in which the task lives.
Oddly, not a single iPhone app I’ve checked out so far meets all my requirements.   In particular, I’ve not even heard of an app that even tries to meet the last two requirements. I say “oddly” because I don’t think these requirements are excessive or bizarre, and I do think they'd be immensely useful.  Still, there it is.

Next, we have to develop weights to assign relative importance to the criteria.  The word relative is key here; we’re not going to say that one criterion is certainly and universally more important than any other.  What I want is to know how important each is with respect to the others and my own experience.  Remember, one size never fits all.

This is where pairwise comparison comes in. Details on how this works are given in another web page (it isn’t hard). The chart below is just the end results. In each cell is the criterion that I thought was more important of the pair given by that cell’s row and column. Since it doesn’t make sense to compare something to itself, and since these comparisons are symmetric (comparing A and B is the same as comparing B and A), then I only need to fill in a little less than half of the whole chart. If you’re thinking this took a long time, you’d be wrong. It took me about 30 minutes to fill in the whole thing.

A B C D E F H I J K L M N O P Q
A Fast - B A D E F H I J K L M N A P A
B Easy
- C D E B B B J K B B B B P Q
C Start Dates

- D E F H I J C C M N C P C
D Due Dates


- DE D D I D D D D D D D D
E Repeats



- EF E I E E E E E E E E
F Priorities




- H I J F F M N O P Q
H Sync





- H J K L H N O H H
I Groups






- J I I I I I I I
J Sorting







- J J J J O J J
K Hotlist








- L K N K P Q
L Restart









- M N L P Q
M Recovery










- N O P M
N Subtasks











- O P N
O Checklists












- P O
P Cond. Deadlines












- P
Q Backlinks -


This leads to the following weights:

Fast 2.46%
Easy 6.56%
Start Dates 4.10%
Due Dates 11.48%
Repeats 11.48%
Priorities 4.10%
Sync 5.74%
Groups 9.84%
Sorting 9.84%
Hotlist 4.10%
Restart 3.28%
Recovery 4.10%
Subtasks 6.56%
Checklists 4.92%
Cond. Deadlines 8.20%
Backlinks 3.28%

So this tells me, for instance, that having due dates and repeating tasks are the two most important criteria.  Task grouping and sorting are a close second.  And so on.

The point of this process is that the human mind is not good at juggling a bunch of variables, but it is very good at comparing one thing against another. Take the trivial case of choosing between three alternatives, A, B, and C. If you prefer A to B, and B to C, then you should accept the logic that A is the most preferred item. To do otherwise just isn’t rational. That’s exactly what pairwise comparison does. And there’s good evidence that this technique actually works.

The next step is to choose a rating scale. This scale will be used to rate each app with respect to each criterion.
There’s a variety of scales I could use, and a great deal of research into qualitative measurement scales has been done. The scale that works best for me – and seems to be the most general – is a five-point scale from -2 to +2, where 0 means “neutral,” -2 means “horrible,” +2 means “excellent,” and -1 and +1 are in-between values. If you prefer something a little finer, you can use a 7-point scale from -3 to +3. I think it’s important to have a zero value to indicate neutrality, and I find it meaningful to have negative numbers stand for bad things and positive numbers for good things.

It’s interesting to note that in some industries (e.g. aerospace), I’ve noticed a tendency to use an exponential scale – something like (0, 1, 3, 9). This is because aerospace people tend to be extremely conservative (for reasons both technical and otherwise), so they tend to underrate the goodness of things. This scale inflates any reasonable rating to make up for that conservatism.

But I’m neither an aerospace engineer nor particularly conservative, so I’ll use the -2 to +2 scale.

Now we can do the weighted decision matrix. The gory details are given elsewhere. The weights come from the pairwise comparison above. In a decision matrix, we rank each alternative to some well-defined reference or base item. We need a reference because we need a fixed point against which to measure things.  For this comparison, I'll use the task manager that I am actually using these days, Pocket Informant for the iPhone, as the reference.

I worked up a weighted decision matrix comparing ToodleDo to ToDo. Here it is:

Ref (PI) ToodleDo ToDo
Wgt R S R S R S
Fast 2.46 0 0 0 0 0 0
Easy 6.56 0 0 -1 -6.56 1 6.56
Start Dates 4.10 0 0 0 0 -2 -8.2
Due Dates 11.48 0 0 1 11.48 1 11.48
Repeats 11.48 0 0 1 11.48 1 11.48
Priorities 4.10 0 0 -1 -4.1 1 4.1
Sync 5.74 0 0 0 0 0 0
Groups 9.84 0 0 0 0 0 0
Sorting 9.84 0 0 1 9.84 0 0
Hotlist 4.10 0 0 1 4.1 1 4.1
Restart 3.28 0 0 0 0 0 0
Recovery 4.10 0 0 0 0 0 0
Subtasks 6.56 0 0 0 0 0 0
Checklists 4.92 0 0 0 0 1 4.92
Cond. Deadlines 8.20 0 0 0 0 0 0
Backlinks 3.28 0 0 0 0 0 0
100.04 0 26.24 34.44

This table might not look like much, but it tells a bit of a story.  The column marked Wgt is the weight of that criterion taken from the pairwise comparison.  Each of the three apps gets two columns.  The R column is the rating I gave it; PI is the reference, so it gets zeros in every category.  That way, if another app does better than the reference, it gets a positive rating, and if it does worse than the reference, it gets a negative rating.  The S column is the actual score, which is the rating multiplied by the weight for that criterion.  The numbers at the bottom of the S columns are just the arithmetic sums of the individual scores.

If you look at the ratings for ToDo, you see that it’s a bit better than ToodleDo on some points, and a bit worse on others. But the +1's don’t actually cancel out the -1's because of the weights. The criteria on which ToDo beat ToodleDo are more important to me than the others, because the weights are higher. That makes ToDo noticeably better than ToodleDo.

It's interesting to note that this version has me preferring ToDo over ToodleDo, whereas my original post had it the other way around.  This is because of all the updates to both apps since I first compared them.  Even though there are some things about ToodleDo that really turn my crank, ToDo is the better app, because it does better on things that I think are more important.

And that jives nicely with my intuition.  I started with ToDo, then switched to ToodleDo (just before I did my first comparison).  But now, given the improvements to ToDo, it's taken the lead again.  If it weren't for the decision matrix, I'd only have a "gut feeling" telling me which was better.  But now, having done the comparison twice, I understand and can explain why I preferred one, then the other, then the one again.

One might ask, then, why I'm using Pocket Informant since both ToodleDo and ToDo beat PI.  The answer is simple: appointments.  PI integrates appointments sync'd with Google Calendar right into the app.  That is an absolute deal-breaker for me: it's just too useful for me to have my appointments and tasks all available under one roof, so to speak.  If I'd've added appointments as a criterion, both ToDo and ToodleDo would have lost to PI.

Back during my first comparison, I ran into a problem with ToodleDo that - though it has been corrected since - remains noteworthy with respect to doing these kinds of comparisons.

The problem was this: ToodleDo used to generate the next in a series of repeating events only when it sync'd with the ToodleDo service.  ToDo, on the other hand, handled repeating events internally.

This was a problem for me when I travelled. I had gone to Berlin for a conference. And I didn’t have a data plan for my iPhone (that’s a whole separate story), so I couldn’t sync either app. But that meant ToodleDo couldn’t roll repeating items over properly.  So before I went to Berlin, I sync’d up ToDo and used it while I was gone.  And when I came back I switched back to ToodleDo.  I did that whenever I travelled.

Does the evaluation consider that? No it doesn’t, because I didn’t. The evaluation is only as good as the evaluator. When I evaluated the two apps, I was nestled snugly at home, WiFi at the ready – and sync’ing either ToDo or ToodleDo was a non-issue. If I’d've done the evaluation in Berlin, I’m sure I’d've gotten different numbers, because the repeating events problem would have been right there in my face, irritating the hell out of me.

So this underscores a limit with the evaluation method – indeed, a limit with any method: it’s only as good as the situation you’re in when you use it. Some people might say a method is only as good as the information you use, but it’s more than that. My situation, in this case, includes me, my goals (at the time), my experiences, all the information I have handy, constraints, and anything else can possibly influence my decisions at the time.

The problem, then, is that a method depends on the situation when it’s used. But that situation may be different for the person doing the evaluation than for the person(s) who will have to live with the decision being made. Indeed, it’s virtually guaranteed that the situations will be different, if for no other reason than the implications of a decision will only occur later.

Does this put the kibosh on these kinds of methods?

Not at all.   It just means that we must be vigilant and diligent in their application.   If I did the evaluation in Berlin, ToDo would have won, because in that situation, ToodleDo would have scored poorly on repeating events.  This is as it should be.  That means that in the two different situations, the method worked.  The problem is that in any one given situation, there’s no way to take into account any other situations.

Happily, there is fruitful and vigorous research concerned exactly with this. Some people call it situated cognition; others call it situated reasoning. We’ve not yet figured out how to treat situations reliably, but I think it’s only a matter of time before we do.

In the meantime, there is at least one other possible way to treat other situations. A popular technique to help set up a design problem is the use case (or what I call a usage scenario). These are either textual or visual descriptions of the interactions involved in using the thing you’ll design. They can be quite complex and detailed. Usage scenarios try to capture a specific situation other than the one that includes the designers during the design process. So it’s at least possible that usage scenarios could help designers evaluate designs and products better.

One final caveat: this evaluation is particular to me. It is unlikely that anyone will agree completely with my evaluation, because their situations are different from mine. So I’m not saying ToDo “is better” than ToodleDo. I’m just saying it seems to be better for me.

As they say: your mileage may vary.

Wednesday, May 19, 2010

Choosing a dayplanner is like playing rock, paper, scissors.

If you're inclined to manage your appointments and tasks with pen and paper, then you've got a huge selection of "hardware." Your choices break down into three main types: hardbound, discbound, and ringbound. But choosing one type can be much harder than you might think. It's like playing rock, paper, scissors: no matter what you choose, one of the others is better. Formally, this is an example of called Arrow's Paradox, but that doesn't make it any less real. Here's some thoughts on the matter.


Rock
The first, classic choice is a binder, like those offered by Filofax, DayRunner, or Succes. You can get these at many office supply stores and there are many vendors of preprinted pages that give you all kinds of variations on the notion of the agenda. And if you prefer something more particular, you can go to specialty shops (like the fantastic Laywine's in Toronto) and find some really cool stuff that is either functional, or atrociously expensive, or both. Over the years I've bought more than a dozen binder-type dayplanners of various sizes, but I have yet to find the perfect one.

There are two main benefits of a binder. First, you can rearrange pages in whatever way suits your personal style. You can buy filler pages from diverse vendors - again depending on which style best turns your crank; you can even print your own pages if you're so inclined. Second, you can refill it whenever you need, which also means you can archive old pages of information that you may need someday but don't want to lug around with you every day.

However, binders can be hard to handle, especially if you don't have much of a work surface on which to write. Writing in a binder when, say, riding on a subway or airplane, can be a real challenge because you can't fold the binder back on itself (as you can with, say, a simple spiral bound notebook). Also, binders aren't as rugged as fully bound books, because the metal parts can't handle being bent. All that metal can also make binders heavier than the alternatives.

Paper
The second popular choice is a fully-bound / hardbound notebook. Perhaps the most famous brand here is Moleskine, though there are other interesting alternatives like Rhodia and Leuchtturm1917. Most of these kinds of notebooks come with pockets in the back cover for collecting receipts and other bits of paper - a very useful feature. These books are usually very rugged and carry with them a certain emotional weight. As a colleague once said of his Moleskine, you feel like you should only write important things in it - no frivolities allowed. I don't know why that is, but I have several such notebooks myself, of various brands and sizes, and I know exactly what he means.

Still, fully bound notebooks have problems. They can be even harder to use than binders because you can't flatten them out as much as binders; you certainly can't fold them over. And you can't rearrange pages or refill them. Some brands, like Leuchtturm1917, try to get around this by printing page numbers on every page and providing a section at the front that you can use as a table of contents of sorts, but it pales in comparison to binders on this point.

A variation of the fully bound notebook is a spiral-bound notebook. The principal advantages of spiral notebooks are their cost (they're quite cheap) and their ability to be folded back on themselves, which increases convenience and usability tremendously.

However, they tend to have crappy paper. The spiral notebooks that have good paper (classic example: Clairefontaine) are often nearly as expensive as fully bound books. And of course, like fully bound books, you can't refill them or reorganize the pages.

Scissors
The third option is a discbound planner, and I have quite a few of these. They are comparatively rare. The binding mechanism is just a series of loose discs (as shown at left) with a wide rim and relatively thin hubs. Paper is punched to have mushroom-shaped cutouts that wrap around the discs in cross-section. Perhaps the best known product of this type in North America is the Circa notebook, but there are other brands available, including Atoma and Myndology; even Clairefontaine has a line of these products. The disc idea has been around for a very long time; it would appear that it was invented by Atoma in 1948.

Because of their unique design, discbound notebooks combine several of the advantages of other kinds of notebooks. They do fold over on themselves like spiral notebooks, making them quite convenient and usable, even in tight situations. You can also rearrange pages, archive them, and buy or DIY your own pages, as with binders. And since the discs are usually made of plastic, discbound notebooks are typically lighter than binders. Since the discs are always equally spaced, regardless of the size of the paper, you can always store smaller pages in bigger notebooks. This is a nice feature because you can have a small notebook in your pocket and still archive its pages safely in a larger notebook at home.

But discbound notebooks aren't as robust as fully bound books, nor do they have the emotional girth of fully bound books like Moleskines. If you need a rugged book, or if you prefer the gut-level significance of a fully bound notebook, the disc system will leave you dissatisfied.

What Now?
So. The perfect notebook, it seems, just doesn't exist. No matter where you start, you can easily work your way into a vicious circle that will never end. Even though I have an assortment of every kind of notebook mentioned here, I never manage to stick to any one of them for very long.

So I tried applying AHP to this decision. I've already written about AHP, which is a way of breaking down a decision into elements that are easier to think about because they are smaller cognitive tasks.

The top three notebooks according to the AHP analysis were, in order, a pocket slimline Filofax, a "compact" Succes binder (larger than the Filofax), and a pocket-sized flip Moleskine. These three notebooks beat the reference item, which is a 3"x5" Circa flip notebook.

While this result wasn't especially surprising to me, it still troubled me. One of the key tasks in executing a AHP analysis is rating the importance of the criteria used to make the decision at hand. This is done using a simple and robust method called pairwise comparison. While all the other steps were quite straightforward, this one remains problematic for me, because the passage of time causes me to alter the relative importance of the criteria in the pairwise comparison. That is, in one instance I may rank page size (the bigger, the better) as slightly more important than, say, robustness; but in another instance, a few days later, I may rank page size as slightly less important than robustness. This, of course, throws the analysis off completely.

On the one hand, this doesn't really help me decide what notebook to use.

On the other hand, it does help identify the root cause of my indecision: I can't come up with a consistent ranking of criteria; that is, I don't (yet) know what's really important to me.  Everything else about this decision is pretty straightforward, except for this one thing.

Notice that this is a meta-level problem; it's not about choosing a notebook; it's about how I make choices. Sometimes, when you get stuck on a problem, it makes sense to step back from the problem itself and spend some time thinking about how you're trying to solve the problem.

That's where I'm at now: I keep trying different notebooks, hoping to find the right one; and every so often, I step back and think again about my underlying problem - what are the things that really matter to me, and which things are more important than others.

For those of you keeping score, I've decided to take the AHP analysis at face value and am trying the pocket Filofax. Wish me luck.

Sunday, May 16, 2010

What's on Your List?

One of my "breakthrough moments" in trying to get organized came only recently - about a year ago - when I asked myself: What kinds of tasks do I put on my to-do list?  It turned out that my answer to that question pointed me at the most important criteria I wanted in a task management system.

So.  What's on your list?  This is one of those reflective questions, when you have to step back from things and look at them from the meta-level.  Do it when you have time, and when you have a reasonable sample of your completed tasks to use as raw data.

Categorize your tasks.  There's an infinite number of possible categories, but only a few will make sense to you personally.  Those are the ones you want to find.  And I can't tell you what they are.

I will tell you what categories I came up with for my own tasks.  Hopefully that will help you find your own.
  1. Tasks I can do in one sitting.
  2. Tasks that take multiple sittings to complete.
  3. Tasks that have a start date.
  4. Tasks that have a due date.
  5. Tasks that are important (and tasks that are not).
  6. Tasks that must occur at a specified date and time (i.e. appointments).
That's it - these six types of tasks cover everything I do.  Let me explain them a bit.

By a sitting, I mean a period of time long enough to get one or more things done, and short enough to not drive myself nuts.  I'm not writing here about GTD's two-minute rule.  It's more a function of how your day is laid out and how much you can do at once without taking a break.  Sometimes, the duration of a sitting depends on what I'm doing.  I find grading exams to be tooth-achingly tedious, yet it's something I really have to be in top form to do.  So I'll work in no more than 15-minute increments, after which I take a break.  Other times, like when I'm programming, I can go for hours without even noticing the time.  (Okay, I'm weird that way.)  The length of a sitting depends, for me at least, on the nature of the task as well as the context (how the rest of your day is arranged).

Personally, I find a "sitting" to be between 15 and 30 minutes, depending on the nature of the task.  This way, I can typically knock off several small tasks or take a bite out of a more substantive task, and then take a break to clear my head.  Taking a break is really important.  (Yes, I know this sounds a bit like the Pomodoro technique.)

You may find your ideal sitting is only five minutes.  Or maybe an hour.  You should experiment.  You'll know when you hit the right sitting length, because you'll find a natural end to something at just about that duration, whether it be the completion of the task or a natural breaking point in the task.  Put another way, you'll find that you're most effective when you've found your natural "sitting" duration.

Ask yourself:
  • How often do you take breaks during your work?
  • How long are those breaks?
  • What triggers your need for a break?
  • How many things can you do comfortably (i.e. without stressing yourself out) in an hour?
These questions will help you determine what your personal ideal sitting length is.  Consider the answers, and think of how you can make these answers part of a routine that is your default way of getting stuff done.  Don't think that they routine you develop is sacrosanct; you have to remain adaptable to circumstance.  But all else being equal, your routine is what will help you be both as effective and efficient as possible.

The idea of multiple sittings is more from AutoFocus than from GTD, but it's something that works for me.  Many times - for instance, when I'm writing posts for this blog - I get the urge to write, but the urge leaves me before I finish a post.  I can either force myself to keep going - and kill the experience for myself (not to mention ending up with a crappier post), or I can just call it a day and come back to it later when I feel like writing again.  Life's too short to waste it grinding through things just because they're on a list.  So I have no problem at all with multiple sittings for a single task.

Start dates are, I think, under-appreciated as task attributes.  The idea of a start date is that you don't even have to worry about the task until the start date.  It's a great way to keep your list uncluttered by tasks that you really, really don't care about right now.  For instance, before I take a business trip, there's always a number of tasks to settle things at work and prepare for the trip.  Once I've arranged the trip, I work up those items and set their start date (yes, I'm assuming there's a software task manager at work here) to be one week before I leave.  That invariably gives me enough time to get everything done without rushing.  And it keeps those tasks out of the way until the time is right.

I think we can safely skip discussing due dates.  Right?

Task importance (i.e. priority) is a prickly subject.   Some people love priorities; other people hate them.  I've already written about priorities.  In what I've written, I've suggested at least three levels of task importance: tasks that need to be done before any other task, tasks that need to be done soon, and tasks that can be done whenever.  I myself use only two levels: things that need to get done soon, and things that can be done any time.

Finally, we have appointments.  I really, really do not see why so many apps support tasks but not appointments, because they really are the same thing.  Fundamentally, an appointment is a task that must be completed between a precise start date/time and end date/time, such that the start and end times are usually (but not always) within a few hours of one another.

Tasks and appointments are basically the same thing.  Ideally, they should be treated in a highly consistent and integrated way.  Unfortunately, this seems to be rarely the case in practise.  Of the dozens of apps I've looked at, I've only found three that make a serious effort to combine tasks and appointments: Pocket Informant, SmartTime, and Google Calendar (with Tasks).

So, we've got our six types of tasks.

A good task management system will handle only and exactly the kinds of tasks you have; nothing more (or it would be inefficient), nothing less (or it will be ineffective).

You now have some key criteria for seeking out (or possibly inventing) your best time management system.  You're looking for a system that manages just these kinds of tasks, as quickly, reliably, and elegantly as possible.

Exactly how you can go about doing that will be the subject of future posts.

Friday, May 14, 2010

Put Things Off: Another AutoFocus-y iPhone App

AutoFocus (AF) is a task management method developed by Mark Forster.  It is ludicrously simple, as opposed to far more complex (some might say byzantine) methods like GTD.  AF was conceived of as a pen-and-paper way to manage your work, but that doesn't stop people from coming up with apps for it.

A nice little app that implements something very close to AF is Put Things Off.  It's key feature is that if you can't do something anytime soon, you just tap the right icon and it gets pushed into the future.  There's an app setting that let's you set how far into the future things will be put off.  It's main problem, as I see it, is a somewhat cheesy look and feel.

I wrote a more detailed review of it some time ago (before this blog existed), and include some screenshots.  If you're interested in AF-style iPhone apps, then you might want to read that post.