Contextual Inquiry

We began our work by conducting in-depth interviews with biologists and observations of their work as a form of contextual inquiry.  The study was conducted in six labs with a range of research focuses.  2-3 individuals were interviewed in each lab, who were either graduate students, postdocs, or research associated.  Participants were asked open-ended questions about the biological aims of their research, the computational tools they used, how they used them, what parameters they manipulate, their understanding of how their tools work, and how their educational and research history has formed their understanding of their tools.  In addition, they were asked to demonstrate common tasks that they perform.

While there were differences in what software was used and how, there was much in common between the relationship between biologists and their software. When asked how their software works, the biologists interviewed invariably described the end-product of the software.  None of the participants knew more than the most basic details about how their software actually carries out their tasks.  Often, participants didn’t know why they were carrying out certain tasks.

For example, none of the three participants who used Mega knew what bootstrapping was or what the difference was between different bootstrapping methods.  Another researcher who uses ARB did not know the difference between the different models it uses, although she knew that it was important that the right model was chosen for each sequence. She saw this automatic selection feature as a selling point of ARB.  All of the participants seemed to feel that the fewer parameters they had to provide, the better.

Three common approaches to learning how to use their software were observed. In many cases, the researcher would simply ask their lab partners how they use the tools and then imitate that workflow. In other instances, researchers tried multiple parameters and then compare the results. If the results appeared identical, the biologist assumed that the parameters did not matter, and that is was safe to choose either.   Worst of all, in some cases biologists will try multiple methods for achieving some output, compare the results, and then choose the result that looks most correct to them.

Each of these approaches is unsettling.  Imitating colleagues’ practices could lead to perpetuation of bad practices.  And conducting limited explorations of the effects of different parameters is a dangerous strategy.  While a parameter might have no effect on one dataset, it might have a transformative effect on another.  And looking at multiple results and choosing the result which appears most correct is certainly not a scientifically valid approach.

Another finding was that although colleagues often shared knowledge about commonly used tools like BLAST, in two cases interviewees indicated that for a particular less common tool or method, they felt they were the only member of the lab, or sometimes the only person in the university who knew how to use that tool.  In these cases, not even the professor running the lab knew how to use the software tools in question.

All of these findings seem to indicate that biologists almost always treat their software tools as “black-boxes”.  They understand what goes in and what comes out, but not what happens in between.  These practices clearly raise ethical questions about scientists' responsibility in reporting computational results, but it also represents an opportunity for exploring ways to help biologists to become more familiar with how their software works. These observations seem to suggest that there is room for improvement in the way Bioinformatics software provides learning opportunities for biologists.

It was also observed that in some cases biologists were reticent to take advantage of the documentation provided with software.  One interviewee walked through the process of doing a BLAST search, and when asked if the help was helpful, opened it up and indicated that he did not understand any of the information that provided.  When we discovered that there were a variety of interactive tools that demonstrate the sequence alignment process (BiBiServ, 2006, Setoft, 1999, Sumazin, 2003, etc) we began asking researchers to attempt to learn something from one of them--BiBiServ's Sequence Alignment Applet.  Three of the four researchers who did this were unable to extract any meaningful information from the application except to identify the two sequences being aligned.  The fourth took it as a challenge to decipher the applet, and was able to discern that the scores represented some kind of match and that the algorithm was searching for “diagonals” but was not able to decipher how it would do that.

The difficulty in both of these cases seems to be that help documentation is often written in language and presentations that biologists don’t understand.  As a result, the time investment required for biologists to learn a small amount about the software they use can be prohibitive.  It seems that because biologists are committed first to solving biological problems and only secondarily to solving computer problems, they sometimes avoid learning unnecessary details about their tools.