Return to articles list

"All interactions magazine articles are posted with the permission of ACM."

© 2000, Association for Computing Machinery. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Reprinted with permission from interactions magazine, 6[3], 25-30, May/June, 1999.


PENNY WISE, POUND WISE:
MAKING SMART TRADEOFFS IN PLANNING USABILITY STUDIES.

Susan M. Dray, Ph.D., CHFP
David A. Siegel, Ph.D.
Dray & Associates, Inc.
Minneapolis, MN USA


When doing usability or other user studies, reality always imposes constraints -on time, personnel, or money. The issue is not whether to make tradeoffs, but, rather, how to make them so that your usability efforts are truly cost-effective. Beware of expediency as a basis for decision making. We have seen situations where cutting the wrong corners in the interests of sometimes relatively small apparent savings undermined the value of the rest of the sizable investment in a project. You don't want the value of your study compromised by skimping on one area, especially if other tradeoffs might have yielded bigger savings.

So, where can you depart from the "ideal" and still have a project that is feasible within your constraints and yields the valuable information you are seeking? As always in Human Factors, "it depends," but here is a rough guide based on what we have seen in working with many clients doing user studies. This does not pretend to be the result of a scientific survey, or of a comprehensive literature review, but, rather, is based on our experience in the field, including some lessons learned in the school of hard knocks.

Dangerous tradeoffs

1. Recruiting
Recruiting is definitely NOT a place to cut corners. Efficient use of your personnel time, and good return on your "fixed costs," like facility rental require a successful recruiting effort. Quality of the resulting data depends on the degree to which the people you are studying reflect the target group of users. Spend the money it takes to get the right people. If you are pleasantly surprised by a lower-than-expected bid for recruiting…be cautious. Also, do not skimp on resources planning and overseeing the recruiting process.

Selection of Evaluators
Good recruiting requires a well-developed and thought-out screener, in our experience.
This does not necessarily mean a highly elaborate screener. We have seen problems with screeners which are too broad, too narrow, or too complex.

Too broad
Sometimes, in an effort to ensure an "easy" recruit, the screener may be made too broad. This is more likely to happen if you are pushed into it because of not have not having allowed enough time or other resources to support the recruit. The risk of course is that you won't screen out people who are unlikely to be users of your product.

For instance, if you have a very general screener for a usability evaluation of a software product, you may have people who have never used a particular platform, or a particular category of product. If your strategy is to design an entry level product which requires no prior knowledge of hardware or software this may be OK, but if this is not your strategy, this can be disastrous. The confounds abound when, for example, someone has to install software in a usability evaluation, but has never done so, and would never do so in real life. Are any problems you find attributable to their inexperience or due to problems with your product? It is impossible to tell unless that is the only problem they have - which is unlikely

Another problem with a broad screener is that you may not be able make any generalizations if you have a range of responses or see a variety of situations (as in a visit study). Is the difference due to real differences or is it an artifact of the breadth of the sample?

Too narrow
On the other hand, if your sample of evaluators is too narrow, you limit the generalizability of your results. Again, if this matches your product strategy, this is not a problem. For example, if you are working on a highly specialized product for a particular audience, you need a highly targeted sample. The point is to be careful that your sample matches your product strategy. This is a common problem when you have to rely only on internal people in your own department (often with no formal recruiting at all) to test a software user interface, for instance. The fact that software designers find your interface intuitive is probably of little help in predicting whether it will be acceptable and usable by the public at large.

Too complex
A very complex screener or one which is very long can also compromise a study. Complex scoring algorithms are rarely required, may make it difficult to determine what population you are really samplings, and may simply introduce practical problems. In one study we did, the client-supplied screener was so complex that recruiters had to take the information down, have the manager do the calculation of the score and then call the prospect back to arrange for the actual testing. In many cases, people who would have otherwise qualified were not included because the recruiter could not reach them a second time, or because they refused to come, in some cases citing the interview on the phone as the reason they didn't want to participate.

The important thing to remember is that recruiting strategies often need some adjustment based on their initial results, and this requires a commitment to actively managing the process. If a too narrow screener is yielding too few participants, it may need to be broadened, but this must be done judiciously. Do not skimp on oversight and monitoring of the recruiting process. It is definitely not something to hand over to a market research company and turn your back on.

Budgeting Time for Recruiting
This is another dangerous place to cut corners. Good recruiting takes time. Take the advice of the recruiter as to how long. Giving them less time will result either in their not being able to get enough people or, worse, getting the wrong people. Too few people is less of a problem if you are doing a local, or internal test, because your other investment will be lower. But if your test involves expenses for national or international travel, if you have committed to a reservation at a facility, or have other fixed costs, you certainly want the study time to be utilized to the fullest. Allow sufficient time to adjust your recruiting strategy if necessary.

Paradoxically, recruiting too far ahead also can be a problem - drop out rate increases as other things arise. If you are going to recruit far ahead, be sure to maintain contact - or reconfirm about a week before testing (and again the day or night before if possible).

Incentives
Yes, we all want to get the right people for as little money per person as possible, but if your incentive is too low, you may not get the right people. For some recruits, you must expect to pay significant incentives. For instance, to recruit network administrators, highly placed technical people and their managers, "experts" in most any field, typically requires high incentives. If you try to cut corners here, you may be able to locate the right people, but not get them to agree to participate. Worse yet, they may agree to come, but fail to appear, resulting in most of the expense with no data.

2. Facilitator Skills
This is another critical area not to cut corners. A good facilitator is key to a good study. If your facilitator 'leads' the evaluator inappropriately (or without being aware of doing so), is not versed in the art of open-ended questions, fails to follow up on behaviors of interest, does not establish good rapport with the evaluator, or does not address the questions from the project team, the evaluation can be compromised. The facilitator must understand the design issues well enough to know where to probe, and be experienced enough to judiciously depart from or elaborate upon the prepared protocol.

Remember also that the facilitator's own impressions are a key source of data. A good deal of information comes from the facilitator's first-hand impression of subtle cues in the interaction. Not only does this help identify problems, but the close observation of the user's responses often contributes clues about how to solve identified usability problems. All of this takes experience and specialized skill.

Team politics may also make facilitation an area where it is important to bring in an outside person. Because of the role that the facilitator can or should play in interpreting and transmitting findings, credibility with the development team may require an outside person. If you do use an internal person, make sure that the individual's objectivity is trusted and that they have enough standing with the team for their input to be incorporated.

Obtaining the right mix of facilitator skills can be a real challenge in international testing. It is better to use a trained usability person who speaks the language, even if not perfectly, rather than a local person who is fluent, if they are not skilled in interviewing, open ended questioning, or if they are not willing to work with a trained usability person to develop an appropriate style of probing. In some situations, the needed mix of usability and language skills means you will need a local speaker paired with an experienced usability specialist.


3. Planning the focus of the study

Although it may seem like an added up-front cost, beginning a usability evaluation project with a thorough review of the system or product and of user and usage information can greatly facilitate protocol development and make for a more cost-effective study overall. Although this may resemble a traditional expert review, it is not aimed at a detailed critique of the design and at specific recommendations for changes. Rather , it is focused on prioritizing the design issues and user tasks where formal evaluation will likely be most useful. With even a moderately complex product or system, it is rarely feasible to evaluate every functionality or navigational pathway.


4. Protocol

Failing to set the stage adequately or accurately, providing too much or too little information, or not having a good idea of the key areas to be probed, are all classic mistakes. Hopefully, a good facilitator can help identify where these potential problems will occur, but even a good facilitator cannot make up for a poorly designed test protocol. This is definitely not a place to cut corners. Spend the time as a team to identify key tasks, scenarios that will tap those task, the information required for successful completion of these tasks, and a logical order or flow. If some tasks logically depend on successful completion of others, be sure to prepare "dummy information" so you can still have evaluators attempt later tasks even if they are unable to complete earlier ones.


5. Team Involvement

Make sure that there is adequate team involvement in planning the study and in participating in the tests. This is important to ensure that the study really does meet the team needs, and that its results will be taken seriously in the design process. Obviously, it is "easier" and "cheaper" to have a usability specialist, whether internal or external, go off and do the study on his or her own. But the data is much less likely to be useful or used.


Lower risk tradeoffs

So where can you make tradeoffs?

1. Facilities

Sometimes you do need to rent a full usability lab, or a focus group facility. If you have a large audience to introduce to the concept of user testing, a highly political project where buy-in to the results is critical, or a sophisticated set-up which requires pan and tilt, zoom, scan conversion, and the like, you may need to pay for lab rental. However, often this is an area where the cost is high, and the payoff is not really justified. Sometimes, a conference room with the design team in the corner, or a video camera on a tripod in the corner with the team watching a monitor in the next room, is equally effective and much less costly. Often, on-site or naturalistic testing may be appropriate and avoid the need for facility rental.


2. Videotaping

Most videotapes are never looked at again. It is time consuming to review them, and creating an edited version can take twice as long as the initial testing, or more. Sometimes, of course, videotapes are critical. For instance, tapes are very useful early in the process of establishing a usability program in a company, or at other times, such as when a key team member or manager is unable to attend the evaluation, when a highly charged political issue is being investigated, or when, as in the case of international evaluations, most of the team is unable to attend.

Of course, if you know in advance that only an edited composite tape will do, you know that you must video tape. However, the added expense of an edited composite is not always called for. The same purpose can sometimes be achieved by cueing up your tapes to the intended spot, is sufficient. This can significantly reduce the time, since no re-recording is required.


3. Scripts
Although I have emphasized the importance of protocol development, an exactly-worded script is rarely needed or useful except for the most novice facilitator. Even for that person, the variability of human behavior makes a formal script unlikely to be useful. Scripts can be an actual hindrance. Better to spend the time and effort on training the novice in how to "follow" an evaluator, how to let them make and recover from mistakes, how to question without leading, and what to do when evaluators become frustrated or unable to continue, than to create a formal "script" to be memorized and followed.


4. Number of evaluators

Some usability evaluations do, indeed, call for large numbers of evaluators. However, numerous studies have suggested that before you reach 10, you have really reached the point of diminishing returns.

Of course, if you have clearly differentiated groups, or different levels of functionality that will be accessed by distinctly different types of users, you may need to have multiple sets of evaluators. However, you should carefully think about whether you really need to distinguish a "critical group." We have seen the costs for studies sky-rocket as people began to define somewhat speculative groups and push for multi-factorial research designs. Remember that introducing new comparisons increases costs geometrically.

A corollary to this is that a decision to test in multiple geographical locations should not be made reflexively. One practical reason for going to multiple locations may be that the product is specialized, and the population of users in any one area is too thin for adequate sampling. However, multiple geographical locations is usually a priority only if you have reasons to suspect real demographic, contextual, or usage pattern differences likely to effect usability. Of course, with products intended for a broad international market, this is more likely to be a factor, but is less likely with products intended for a national market, or products that are building on an already well-established platform. Designs that are highly innovative may call for evaluation of fundamental design choices that can be assessed with any fairly representative local group of users--good news for start-ups that often produce such products and have limited resources for usability evaluation.

5. Statistics

Similarly, just as testing rarely requires complex research designs, most of the evaluations we do have not called for more than simple descriptive statistics. Often even these are not necessary because of the power of the qualitative observations. Even in cases where there might be interest, the numbers of evaluators are rarely sufficient for meaningful statistics, and the time required to compute them is not well spent. It is more common that we find major usability issues that would have been difficult for the team to anticipate, but that "leap out at you" when you see users struggling with them, and don't need to be teased out through statistical analysis.


6. Formal Reports

As with videotapes, formal reports often go unread. In some organizational settings or with some types of products where there are liability issues, a formal report may be necessary to document findings. However, usually, the important thing is simply that findings be conveyed to the team in a meaningful way to influence the design process. It is often less expensive and more effective to use informal, participative approaches to transmit findings and involve the team in examining them. These can be supported as needed by data summaries, vignettes, and other techniques for displaying qualitative information.


Conclusions

Some generalizations emerge from putting all of these recommendations together. Overall, you will derive more benefit from your usability dollars by doing studies that are simple in design, but adequately supported for planning, recruiting, and usability skills such human factors knowledge and test facilitation. Furthermore, good planning has to do more with developing good communication, understanding of the issues, and mutual confidence between the development team and the usability expert, than with development of rigid verbatim protocols and scripts (assuming adequate usability skills). This mind-set is certainly easier to achieve when usability evaluation is done iteratively, beginning early in the process, than when it is put off until the end, waiting for the finished product to be "ready" to test and the perfect study designed. Of course, taking this one step further brings us to an oft-cited theme--the most effective use of usability dollars is to build usability into the development process from the beginning, rather than treating it as a major add-on at the end.

 

Return to articles list