Conversation Trees and Threaded Chats
Marc Smith & JJ Cadiz
Collaboration & Multimedia Group
Microsoft Research
One Microsoft Way
Redmond, WA 98052 USA
+1 425 936 6896
{masmith, jjcadiz}@microsoft.com
Byron Burkhalter
University of California, Los Angeles
2201 Hershey Hall
610 Charles E. Young Drive
Los Angeles, CA 90095-1551
+1 408 445 9779
burkhalt@ucla.edu
Chat programs and instant messaging services are increasingly popular among Internet users. However, basic issues with the interfaces and data structures of most forms of chat limit their utility for use in formal interactions (like group meetings) and decision-making tasks. In this paper, we discuss Threaded Text Chat, a program designed to address some of the deficiencies of current chat programs. Standard forms of chat introduce ambiguity into interaction in a number of ways, most profoundly by rupturing connections between turns and replies. Threaded Chat presents a solution to this problem by actively supporting the basic turn-taking structure of human conversation. While the solution introduces interface design challenges of its own, usability studies show that users’ patterns of interaction in Threaded Chat are equally effective, but different (and possibly more efficient) than standard chat programs.
Chat programs, turn-taking, conversation, computer mediated communication, synchronous communication, persistent conversation, human computer human interaction
|
|
Chat is an old and increasingly popular form of computer-mediated communication. Commercial on-line service providers like America Online and non-commercial networks like Internet Relay Chat provide a myriad of chat rooms filled by millions of people each day. Instant messaging programs from AOL, ICQ, Yahoo, and MSN are becoming increasingly popular. Two billion instant messages are exchanged each day on the AOL network, 0.7 billion on MSN. This form of communication is likely to increase as cell phones and wireless handheld computers make mobile messaging even more prevalent: wherever cell phone short message system (SMS) service is available, its use is rising dramatically. Chat is here to stay.
Although these chat programs are popular for informal interaction, several companies are now bringing chat to the business world [3 ]. However, chat has not evolved much in the past twenty years and remains poorly suited for holding complex discussions. Innovations in chat have mostly ignored this problem. There have been a number of chat systems released by commercial Internet software companies that have integrated a variety of 2D and 3D graphical representations with standard chat [ 1 , 2 , 6 , 12 , 21 ]. However, few have altered the way chat organizes people’s exchanges of messages in a positive way, making chat even less easy to comprehend in many of these systems. The recent explosion of “Instant Messages” and “Buddy Lists” has not changed the underlying structure of chat either.
In this paper, we will discuss the core problems we see in chat and describe the ways this guided our design of Threaded Chat. In addition we report the results from a lab study that tested the usability of Threaded Chat in contrast to standard forms of chat with eighteen small groups engaged in a decision-making task. We discuss the challenges raised by the design of Threaded Chat and suggest future directions for improvement of systems to support persistent computer-mediated interaction.
Chat is the form of computer-mediated communication that most closely resembles spoken interaction. But in contrast to spoken interaction, chat is poor at managing interruptions, organizing turn-taking, conveying comprehension, and resolving floor control conflicts. Studies of chat from a variety of fields (including sociology, communication, CSCW, and HCI) share a focus on the challenges and ambiguities chat introduces into the normal mechanisms of social interaction.
Conversation Analysis (CA)—the sociological study of the structures of ordinary face-to-face and spoken interaction—is of particular value when seeking ways to improve chat. CA’s study of naturally occurring conversation reveals that people use a suite of fine tuned, ordinary techniques for maintaining spoken conversations that are coherent and understandable. Spoken conversations have turn and response structures governed by a set of simple rules that organize how turns of talk will be exchanged between groups of people. Sacks et. al. [ 17 ] argue that turns are valuable commodities that require an orderly allocation system:
For socially organized activities, the presence of ‘turns’ suggest an economy, with turns for something being valued—and with means for allocating them, which affect their relative distribution, as in economies.
Using simple turn-taking rules, people are able to sustain spoken conversations across a wide variety of topics where there is almost always one party talking at a time. Interruptions and overlaps do occur but are brief, and transitions between speakers commonly occur without gap or overlap [17 ]. In contrast, in its most common form, chat organizes turns in order of their arrival at a central server, not in the order of turn and response in which they were constructed. This undermines the techniques people use for organizing coherent conversations. The result is an inclination for confusing exchanges of short messages in ambiguous order. This makes chat a poor decision-making tool and knowledge store and reduces its value for meetings and presentations of detailed ideas.
Computer-mediated conversation has the potential to transform the constraints of the economy of spoken interaction in more positive ways. Our inability to listen to two or more people speaking at the same time for very long limits the number of possible turns available in any spoken conversation. In contrast, chat may be less restricted than spoken communication since more than one person may construct a message at the same time, and reading can be quicker than listening. Nonetheless, turn-taking systems for spoken discussions allow for more coherent and productive conversations than standard chat programs. Thus, the properties of spoken conversation systems offer guidance for the design of text chat.
CA directs attention towards improving the way chat structures the turn-taking system used in the exchange of chat messages. Threaded Chat presents a possible solution to this problem by supporting a synchronous form of the turn-taking structure found in asynchronous threaded discussion boards like Usenet. Systems like Usenet and a vast number of discussion boards on web sites allow for the creation of extensive discussion trees composed of message (“post” or “article”) turns and responses linked together. These systems have predominantly been used as a form of asynchronous interaction in which delays of hours or days between turns and responses are common. While these systems suffer from problems of their own [11 ], discussions of complex ideas can be developed over time with responses clearly linked to the messages they are in reply to. In Threaded Chat, we have modified this structure to make it more accommodating to both synchronous and asynchronous use.
Research rooted in the sociological study of conversation has identified and addressed some of the major issues with standard chat programs [7 , 8 , 15 , 19 ]. These findings lead us to identify five main flaws in existing chat systems:
Chat programs present each participant’s messages in a way that makes it hard to differentiate speakers. The high turn over of participants in many chat rooms aggravates this problem further. A number of systems address this issue in one form or another. Many chat clients provide ways of associating a color or font with particular people. More recently, systems have focused on awareness of presence of people in the room [4 , 18 ], representations of the timing of the conversation [19 ], and improved visualization of conversations [ 18 ].
In chat, participants do not receive moment-by-moment information about the reaction of those who are listening to them. This means that turns cannot be altered as they unfold, increasing the likelihood that they will be misunderstood or taken in the wrong way. Without indications of listening, chat systems loose a great deal of their sense of social presence.
Some experimental systems have addressed this issue. Erickson et al.’s Babble [5 ] addressed this problem by presenting a “social proxy”, a graphic design that represented the activity of people with the application. This allowed people to have an intuitive sense of who was recently active but lacked the granularity to present reactions to turns-in-progress.
Chat systems only transmit turns when users press the ENTER key. While some systems do transmit messages keystroke-by-keystroke (i.e. the Unix program “Talk”) most forms of chat do not. As Garcia notes, the result is that the process of message production is separate from message transmission in chat. Chat is not truly synchronous: it has a sporadic rhythm in which fully formed turns pop out in a single moment instead of being produced in an unfolding manner. Chat lacks the “mutual availability of utterances-in-production” [7 ].
In contrast, the moment-by-moment surveillance of others in spoken conversations allows people to be highly sensitive to small variations in timing. For example, when declining an invitation or disagreeing with another’s assessment, people will often slightly delay the beginning of their turn. The delay projects a dispreferred response (a response that the user would not like), allowing the original assessor to downgrade or alter the assessment in order to maintain agreement. People are able to connect turns so quickly and assess the gaps between them because speakers project where their turns are heading and listeners recognize those projections as the talk unfolds.
Delays in chat resulting from typing difficulty or the other user leaving the room can easily be misinterpreted as a dispreferred response. Furthermore, delays encourage users to type additional turns (which may modify their initial turn or start a new topic of conversation) instead of waiting. Garcia [7 ] found that timing and sequencing distortions introduced by standard chat systems meant that a significant portion of chat turns were used to clear up confusion caused by prior turns.
Vronay’s Flow Chat [19 ] explicitly presented the stream of time and the resulting interleaving of turns of chat. Flow Chat placed each user’s text on a separate vertically stacked parallel track. While text entered by the user was not displayed until the turn was completed a colored band was extruded from the right side of the display to indicate when the user began typing and how long they had been composing the message. Once entered the text was displayed in the color bar, which then continued to slide towards the left of the display on its track. While this clarified the sequential ordering of turns, it did not provide any other way to indicate a link between two turns. In large groups this means that links between turns separated by many tracks were difficult to associate.
Viegas
and Donath’s Chat Circles [10
] approach this from a different direction.
Chat Circles presents each user as a colored circle that expands with
the amount of text entered by the user. Circles
then slowly shrink in size as the text fades.
The timing of turns is thus visible and turns-in-progress are presented
as expansions in the size of the circle.
This view of the conversation lacks a historical component as turns
evaporate over time. As a result the application has an alternative historical
view, which visualizes the conversation along a vertical time line cross
marked with lines indicating the timing and size of each user’s turn.
This is in many ways an alternate form of Vronay’s Flow Chat that
shares its limitations.
Microsoft’s
MSN Messenger is one commercial product that partially addresses the problem
of seeing turns-in-progress. When
others are typing, “[name] is typing a message” appears at the bottom of
the window. Although this
alleviates some of the problem by providing a binary indicator of typing, it
does not entirely solve the problem because users cannot see exactly what
others are typing until the ENTER key is pressed.
Much of the work in conversation coordination relates to shaping a turn’s meaning based on its location. However, the techniques used to accomplish this in spoken interaction are undermined in chat conversations. Standard forms of chat position turns based only on the time that the ENTER key is pressed, which often ruptures the links between turns and their replies. “Participants in QS-CMC cannot assume that their attempts to be a ‘first poster’ will result in the message they are typing being placed adjacent to its intended referent,” writes Garcia [7 ].
In standard forms of chat, ownership of the floor is only known when a turn is completed, at which point a race begins to finish one’s own thought, which is newly fitted to the recently emerged turn. This twisted set of conversational rules has two ramifications: first, one can only begin to fit a “next” turn after the last turn has been displayed in its entirety, and second, there is a preference for short turns because one must press the return key in order to secure the floor. Therefore, extended turns, which can allow more complex material to be discussed, are much less frequent.
For example, consider the following chat interaction:
1
Larry: boy do we need to work on
our interview skills....
2
James: who's conducting the
interviews, anyway?
3
Scott: Yes
4
James: okay...
5
Larry: All of us
Notice that James and Scott are entering both turns simultaneously. Each turn is fitted to Larry’s initial turn. Although Scott’s turn “Yes” appears immediately after James turn “who’s conducting the interviews, anyway?” it obviously does not fit as the next turn. Similarly, Larry’s turn “all of us” follows but does not fit the prior turn of “okay…”. The only way users can make sense of the turn is to scroll up and find a candidate “prior turn.” That people can do this is interesting in its own right, but the procedure is time consuming (and while one scrolls, the conversation continues). The result is that transcripts of chat conversations are often confusing and demand significant effort to read.
Babble
[5
] addresses this by designing for an expectation of slower interaction
rates than typically found in chat. The
slower rate allows users to have greater certainty that their turn will occupy
the position it was crafted for. As
a result, short expressions of concurrence (ex. “I agree”, and “yes”)
are possible and meaningful. Sequencing
problems do sometime occur, however, and are likely to increase if Babble is
used more synchronously.
Chat rooms are social spaces that never develop a social history [ 10 ]. In practice, most chat rooms are not publicly persistent: their content evaporates as soon as it scrolls out of each user’s history buffer. This lack of persistence means that most chat spaces do not accrete a social history. Groups do use other media (for example, web pages) to create durable artifacts of their interaction, but the chat room itself does not change as a result of the activity within it. Even if logs are maintained, as noted above, the resulting transcript is often nearly unintelligible.
This usually is less of a problem during the conversation than several days or months later when one tries to review chat logs. For instance, when a chat conversation occurs, if two turns appear within a tenth of a second of each other, it is probably clear to an attentive participant that the second turn was not intended to be a reply to the first. However, timing cues are missing from most history logs. Thus, ruptured and jumbled turn sequences make the conversation log ambiguous and unreliable as records. (This problem can be addressed by including timestamps with chat logs, but reconstructing the events of a chat room using timestamps is tedious.)
This has two implications. Having no useful recordings of chat conversations is a significant obstacle in workgroups and business environments, particularly when used in decision-making processes. It also means that chat programs demand full immersion to remain comprehensible to their users. When users look away or try to maintain peripheral awareness many find it difficult to catch up with conversations.
Threaded Chat addresses the problems of confusing history logs, lack of social history, and the rupture of turn sequences in standard chat rooms. Threaded Chat departs from traditional chat in a number of ways by bridging the gap between threaded asynchronous discussions and synchronous chats. The Threaded Chat user interface is displayed in Figure 1. All chat turns are structured as a tree, similar to the Microsoft Windows Explorer interface to the file system on a computer’s hard disk. The key element of this structure is that turns are organized into turn and response structures called threads that can grow to any size. Thus, proper use of Threaded Chat eliminates the possibility of ruptured sequences of turns: turns are linked directly to the turn they are intended to respond to. Even if a turn is misplaced, it can be dragged and dropped to the correct location. Turns can also be edited or deleted.
To chat, users click on the turn they want to respond
to and being typing. Pressing
return completes the turn. When a
user begins to enter text, their name and a placeholder message (“Entering
Text”) appears to all other users. When
the return key is pressed, the entire message becomes visible to everyone.
As turns are entered, they are displayed to other users in a bold font. Over time, the font fades to gray so that most recently added turns stand out clearly. This feature is especially important since Threaded Chat does not structure turns in order of arrival (a point we return to below). Turns are unbolded and marked as read when clicked on, replied to, or cursored over with the arrow keys. As a turn is replied to, the count of the number of replies and unread turns beneath it are displayed.
Selecting the room node at the top of the chat room and entering text creates a new top-level thread, which is highlighted with a colored background. Top-level turns are typically the major topics of conversations, thus they are distinguished from other turns.
The tree structure of Threaded Chat provides users with the ability to collapse any branch of the conversation if they no longer wish to pay attention to it. For example, users may collapse discussion branches that no longer concern them or that have come to a conclusion and are no longer pertinent. If additional turns are added to a collapsed thread the count of unread child turns is incremented.
The bottom of the Threaded Chat window contains information about the participants of the conversation. Information includes time of entry, number of entries (labeled “sessions”), and time of exit. Basic statistics about the number and types of turns are also displayed. These statistics persist from session to session, and users remain in the list even when they are not present (although they are marked as not currently active). This information is useful for providing a sense of history and context for the chat room.
Threaded Chat automatically labels turns that are likely to be questions or answers. If a question mark is found in the text of a turn, the turn is tagged with a “Q”. All replies to questions are tagged with an “A”. Numbers of question and answer turns are tracked in the social accounting pane.
Turns can be edited, deleted, or dragged and dropped to different places in the tree. Although this can be a helpful feature, it also raises the possibility of abuse. Thus, each Threaded Chat turn has permission properties based on an extension of the Unix user/group/world model. These permissions are accessed by right clicking on a turn and allow a turn’s author to determine who can see the turn, reply to it, delete it, and extend the turn’s permissions (Figure 2.). Only authors of a turn can edit the turn’s text.
Turn authors are also the only people who can modify a turn’s permissions, although owners of turns higher up in the tree may override the rights. For example, if person B replies to person A and specifies a set of rights on the reply, person A could override person B’s permissions by specifying rights on the original turn. Users retain the power to override permissions of the turns that are replies to their turns, including the power to delete or move the entire thread to another location. The first person to start a big conversational branch wields significant power over it. Using Threaded Chat’s permissions, it is possible to have a private chat in the middle of a public room, or to have a public discussion with a select group without possibility of interruption from others. It also means that users can enter a turn and determine who may see and reply to the text.
Given that Threaded Chat is designed to address some of the key problems with standard chat, we conducted a user test to see if the design was successful.
Specifically, because proper use of Threaded Chat guarantees that turns will always be placed in their intended context, we expect that:
70 participants were recruited for a lab study to test Threaded Chat. Participants were grouped into eighteen teams of four; however, due to no-shows, eleven groups had only three people. Participants received a free Microsoft software product for their time.
All participants had used a chat program at least once in the past year, were comfortable with typing, had graduated from high school, and were at least intermediate computer users. Participants were recruited such that the pool was diverse in terms of age, gender, and occupation. The pool had 38 men and 32 women. The average age was 39 with a standard deviation of 10.
Participants used three different chat programs for this study: Threaded Chat, a “standard” chat program, and LeadLine, an experimental chat program created by the Microsoft Research Virtual Worlds Group [2 ]). The order in which the three chat programs were used was counterbalanced to minimize order effects.
Participants were told they were employees for the same company and had recently interviewed three candidates for one job opening. Their task was to chat with each other for 20 minutes and then, as a group, rank the candidates in order of hiring preference. This task was repeated three times, each time using a different chat program, a different set of candidates, and a different job position. In each case, participants were given unique information about the candidates, thus no single participant could correctly rank the candidates without chatting with other group members.

When using the Threaded Chat program each group started with a room populated
with six initial threads:
Introductions
Review
the qualifications for this position
Discuss
candidate #1
Discuss
candidate #2
Discuss
candidate #3
Final
decision: Who should we hire?
Although these threads were made available as guides for the discussion (similar to a agenda for a business meeting), users could (and did) ignore them if they wished.
After each session, participants answered a variety of questions about their reactions to the chat program they used. On all the measures (enjoyment, anxiousness, confusion, decision satisfaction, difficult with using program, etc.) Threaded Chat was rated significantly worse than the regular chat program. To a certain extent, this was not surprising given the early stage of the prototype. Although the core concept of chatting with threads was functional, some basic user interface issues had not yet been resolved (for example, lines that were longer than the screen width did not automatically wrap to the next line).
Furthermore, survey results revealed that participants were accustomed to using chat for informal discussions with friends and family. Thus they may have been evaluating Threaded Chat for these types of discussions while Threaded Chat was intended for task-based, business discussions. A more realistic future test of the value of Threaded Chat may be a field trial in which conversations develop over longer periods of time, shifting the focus of the system towards more asynchronous use.
Despite the lower subjective ratings Threaded Chat received in contrast to standard chat, the study showed that users quickly adapted to the new interface. Performance on the hiring task did not differ significantly between chat programs. Each hiring task was designed such that there was a correct solution, and each set of candidate rankings was assigned a score relative to its distance from the correct solution. The highest possible score for each task was 5 points. Threaded Chat groups had an average score of 3.7 while plain chat groups had an average score of 3.9. This difference was not found to be significant, even when taking into account various demographic variables such as typing speed, level of education, and experience with chat programs.
Even though scores on the task were equivalent for each chat program used, Threaded Chat did affect the processes used by groups to reach their decisions.
Groups that used Threaded Chat took fewer turns than in the regular chat program. Threaded Chat rooms had an average of 21.7 turns, while the regular chat rooms had an average of 34.7 turns, which was a significant difference (t(25.5) = 3.7; p = 0.001). In a regression equation controlling for various demographic variables, the use of the Threaded Chat program was the strongest predictor of the number of turns taken, accounting for 28% of the variance (t = -5.2, p < .000).
Of course, it could be hypothesized that fewer turns were taken in Threaded Chat because people took longer turns. However, this was not the case. The average standard chat turn was 7.3 words long while the average Threaded Chat turn was 7.6 words long, which is not a significant difference (t(3020) = -1.3; p = 0.205).
It is possible that Threaded Chat reduced the ambiguity introduced by standard chat, thus allowing people to enter fewer, more coherent turns whose meaning was partially derived from their parent turn. While the frictions imposed by the Threaded Chat interface may have simply been a drag on the speed of participation, the equivalent scores on the task show that Threaded Chat users were equally able to complete their task using fewer turns.
We also examined the question of whether there was a more equal level of participation among group members in the different types of rooms. We used the standard deviation of number of turns taken by the people in each room as a measure of equal participation. If everyone in a chat room took the same number of turns, then a group would have a standard deviation of zero.