Friday, 24 February 2012

Google+ evaluation results

On the 20th of September 2011 after a test period of about three months, Google released Google+, its social network site, for the general public. It introduces some new concepts such as Circles to organise contacts and Hangouts for video chat. In order to get an idea of what users think of the application we conducted a small evaluation. This article presents a summary of our results. A more extensive document can be found here.

Approach and setting

Our evaluation is based on a QUIS 5.0 questionnaire (Questionnaire for User Interaction Satisfaction) with some additional questions about age, IT experience, social network experience, Google+ experience and Google experience in general. We chose this approach because it is an established method to gather user satisfaction information in a convenient way in a short timespan. Care has been taken in its formulation and goals. Other questionnaires or heuristics are sometimes too high-level (e.g. NAU, ASQ) or too long (e.g. PUTQ). Moreover, QUIS classifies the questions in categories, allowing us to judge the application at different levels of abstraction. Other questionnaires do this as well (e.g. USE, CSUQ). They mostly deal with similar issues, so we chose QUIS as a standard.

As this gives quite general information, some additional users have been interviewed as well. For this, light-weight 'think aloud' interviews have been used. Light-weight in the sense that the user was monitored by one person, although audio and screen were also captured for further study. We chose this dual approach to include both general and specific information.

With this information, we want to draw some conclusions about the user satisfaction for the various points of interest in the Quis questionnaire. This can guide developers in their general design. On the other hand, we will highlight some specific flaws noticed by several users. In this way, more specific corrections are possible as well.

Participants

Because questionnaires have become very common, people usually delay their response or don't answer at all if not asked personally. A time span of a week is rather short as well. These two effects lead to a rather low response of 11 people, nearly all in the age class 20-25. The chart below shows some other user characteristics.


IT experience is spread evenly. Time spend on networks is rather high, which might be due to the age class the participants belonged to. Whereas people have used Google products quite often, Google+ seems to have a smaller audience so far.

For the interviews, we asked 4 inexperienced users to perform some specific tasks. Their profile is consistent with the questionnaire participants.

 

Evaluation method

We evaluated the questionnaire graphically with boxplots to draw conclusions. On the one hand, we made a plot containing all of the questions. On the other hand, we made a per-category plot. Then we reasoned about the distribution of the users over the scores and which opinions were outliers for each category in general and for all of the questions in it. We referred to positive or negative comments we received as well.

The user interviews were compared horizontally, that is: we looked for difficulties experienced by multiple users when performing a task. We also tried to match this with the results from the questionnaire.

Evaluation results

The chart below shows the results of the QUIS questions. The numbers refer to the questions as shown on this questionnaire.



The graph below specifies the time (min:s) it took users to complete a task, thus giving a general quantified overview of the user interviews.

*estimated timings.

User 3 and 4 are significantly slower to create an account because they had to create a google account as well. Signing out was the easiest to find out. Deleting a profile was not an obvious thing to do. One user used Google search to find it eventually.

A more detailed analysis is provided in our more extensive document.

Conclusions

When looking at the representation on the screen, it was deemed slightly better than average. The use of highlighting could be improved. Two interviewees experienced the same problem: they didn't see they could created an account with their Google account, so they started from scratch. Highlighting could solve this.

Terminology is evaluated the same. Although terms are used consistently throughout the system, they represent some unknown concepts and seem therefore unrelated. A typical example are the Circles and Hangouts. All interviewees had troubles starting a 'video chat' because they didn't link it to Hangouts. Yet, pictorial representations are used as a useful clue.

Learning is also evaluated better than average. The availability of supplementary material could be improved. We know some material is available, but it is hard to find. Users also thought it was easy to learn by trial and error, as they are used to the kind of layout from other network sites. In interviews, people stated that even the hangouts were not much of an issue, once you know the term.

Capabilities of the system are considered the best part of it. We can conclude that the system is fast and reliable. Yet, we wonder if for instance the positive rating for noise is not based on a purely auditive interpretation, whereas we rather see it as whether the system is intrusive or not. Many people complained about clutter and irritating pop-ups.

As a result, the overall reaction to the software was rather positive. This conclusion is backed up by the impressions of the interviewees.

14 comments:

  1. There seems to be something wrong with your pdf (at least when I try to open it), as it only shows the graphs and weird grayscale-gradient bars instead of text. The link to the QUIS questions also results in a 404...

    ReplyDelete
  2. You have performed the questionnaire approach and the think aloud approach? Which in your experience would be the most useful for our own software project?

    ReplyDelete
    Replies
    1. I think the questionnaire is a great tool if you want to extract some general thougts about the system, while the think aloud approach is better suited to find some direct problems or issues.

      Delete
    2. It is indeed a great tool but maybe too general.
      We shared our link through facebook and someone commented it is a lame questionnaire etc..
      So we answered it was at recommendation of prof. Duval (=D) and said it is a popular questionnaire in this domain.

      Delete
  3. Personally I think of advertisements when you ask about noise and clutter on Google+. Maybe that is what the other people were thinking about when answering your question, because there are no adds on Google+ (yet)

    ReplyDelete
  4. We also did an evaluation using the QUIS questionnaire and became similar results. Our test subjects also thought that there wasn't enough reference material available. I think this is quite odd because there is material available people just don't seem to find it for some reason.

    ReplyDelete
    Replies
    1. Good remark. As you also reported on our blog, our QUIS results (team Chimaera) regarding the reference material available, were different: the participants in our test indicated that there was enough reference (help) material available. Probably this has to do with the sample size and the different background (IT experience) of the test population.

      Delete
  5. Indeed I don't think the problem is that there's not enough reference material, people just don't seem to find it that easy.

    ReplyDelete
  6. Very nice report. Our group evaluated Google+ using a usability lab. The results of starting a video chat are quite interesting. We didn't have this in our assignments (tough one of our subjects did start a hangout when he was asked to comment on a message). The term hangout isn't very clear indeed, but perhaps it is for native English speakers?

    ReplyDelete
  7. The problem with the terminology is something that showed in our results as well. I actually find it a pitty, because they clearly put much effort in making an intiutive design, putting options where you want them to be, organizing the flow of screens,... But then users cannot find the option or start the flow because they don't understand the name on the button...

    ReplyDelete
  8. Good idea to use QUIS questionnaire (subjective) in combination with an objective parameter such as the time it took users to complete certain tasks. However, how did you control in your test the influence of other parameters that can influence this ‘time’ parameter (e.g. environmental parameters such as noise, light, presence of other persons, etc.)?

    ReplyDelete
    Replies
    1. I noticed that f.i. Team Sjiek has taken some measures in their test. They have placed the test computer facing a window with closed curtains, to avoid external distraction that could influence the results. If the confounding factors are not measured and considered, the study may lead to wrong conclusions. The use of a real lab could lead to a more accurate evaluation. Something we can consider in future evaluations.

      Delete
    2. What you say is true, in order to get better result we should make use of professional methods like the ones you suggested but in our case we didn't make it too complicated like some other teams. But sure it does really change the results to some degree.

      Delete
  9. This comment has been removed by the author.

    ReplyDelete