Tip:
Highlight text to annotate it
X
bjbj< Brandon Schauer: Richard Dalton up next. He s recognized in our community as one seriously
smart dude who asks the hard questions and finds really great answers to those that we
can all start to draft off of. So how do we know that our work in user experience is actually
helping? Richard s going to walk us through a practical guide to measuring user experience,
breaking it down into techniques that we can measure and that we can interpret. Richard?
Richard Dalton: Thank you, Brandon.
Before we get started I d just like to give some breaking news. I saw on the Twitter a
few minutes ago that Charlie Sheen has officially been fired from Two and a Half Men. What s
with that? But the best thing is he released a statement, which I just have to read, okay?
Mr. Sheen released a statement to TMZ.com saying in part, This is very good news. Now
you ve got to credit the guy for a positive attitude, right? They continue to be in breach
like so many whales , adding, and I never have to put on those silly shirts for as long
as this warlock exists in the terrestrial dimension. I mean can we please have him speak
at next MX? Please? Okay. As Brandon mentioned I m a user experience manager at Vanguard,
which is the world s largest provider of mutual funds, but I m actually so glad that Tomer
talked about how to fly an airplane a little earlier today because I don t wanna talk about
financial services for a second. I wanna talk about helicopters. Many years ago I heard
a very funny recording of an after dinner speech given by a guy called David Gunson,
and David was an ex-fighter pilot and air traffic controller, and in his speech he talks
a little bit about helicopters and how to fly them. Helicopters he said are very strange
beasts. They have a stick between your knees and the rule with helicopters is you mustn
t keep the stick still. Just keep it moving. Doesn t matter where, just keep it moving.
He says you apply phenomenal amounts of power to these things and they defy all known laws
and lift off, when of course they should screw themselves into the ground. Now once they
re off the ground he says you go crazy with the stick. Just keep it moving in any direction.
It doesn t matter until you get to a safe height, and when you get to a safe height
you hold the stick in one position and you watch what the helicopter does, because if
you want it to do it again that is where you put the stick. There s a lot of hit and miss
involved and trial and error with choppers he says. Even the manufacturers don t have
a lot of faith in these things. That s why they typically put wheels, floats, skids and
skis on them just so you ve got a fighting chance wherever you might come down. Now we
don t fly helicopters at Vanguard. We don t even have private jets at Vanguard, but
this concept of observation and trial and error is very, very familiar to us because
over the past two years or so, particularly in the last 12 months since I was at MX last
year, we have been holding the stick in one position and watching what our experience
does so that we can learn about it and improve it, and that s some of what I d like to share
with you today. Now I can hear some of you thinking, oh my god, this is another presentation
about ROI and user experience, but I can assure you it s not. In fact I even tried to come
up with a funny slide to illustrate that point, but pretty soon I realized that there is nothing
funny about ROI and so I had to give up. You see there are two types of measures and audiences
for measures. The first are decision makers and stakeholders who use measures like ROI
to justify doing user experience design in the first place. The second are our teams,
which many of us manage who need to use measures to understand why our experiences are succeeding
or more commonly failing, and to make them better, and it s those types of measures by
practitioners, for practitioners, that I d like to focus mainly on today. So let s take
a look at what we ll cover today. We ll talk about the importance of objectives. We ll
talk about the measures themselves. We ll talk about success criteria, and then I ll
conclude by describing some cultural challenges that we ve been facing at Vanguard and what
we ve been doing about it. So the first thing is when you re measuring your experience it
s very, very important to measure that experience against what it s set out to achieve in the
first place. You have to understand your objectives of your experience. These are my two boys,
Adam and Alex. They re eight and six, and they ve been doing tai-kwan-do twice a week
for the past six months, but that s not how they measure their progress. You see, they
re yellow belts at the moment and they want to become black belts. They talk about it
all the time, and they measure their progress against how far away they are from being a
black belt. This is their three-year-old sister Abigail. She s just a white belt and all her
objective is to catch up to the boys. So how can we understand the objectives of our experiences?
How do we figure those out? Well we use this very simple model as a reference point for
talking about that. We know that our users have goals. They like to retire when they
re 55, send the kids to college, and buy the beach house, and our business has goals. Vanguard
is a client-driven organization so one of our goals is to lower costs so we can pass
those back on to our clients, and those goals are realized through tasks. Users do stuff
to reach their goals. They open accounts, they check their balances, they find mutual
funds, and we the business, we want users to do certain tasks like roll over their IRA
s so we can increase assets or turn off their paper statements cause they re expensive and
we can lower costs and pass the savings on, but we know that users aren t just robots
blindly performing tasks. They re human beings. They have emotions and feelings and those
are important contexts for us. Users feel certain ways. They may be worried and anxious
that they re going to reach their goals. They may be confused by the choices available and
our business we know we want users to feel a certain way. We want them to feel confident
and independent and successful and trusting. These tasks and emotions then are realized
through our multi-channel experiences. We break our experiences down into smaller chunks
called capabilities, and it s these capabilities that enable clients to do the things they
want to do and encourage them to do the things that we want them to do. These capabilities
are created and changed over time then by projects. Now it s very important when you
re establishing objectives for your user experience and therefore measures that you do so at the
capability level and measure how well your capabilities are satisfying your user tasks
and emotions. If you do it at the project level your experience becomes very, very schizophrenic,
because every project along defines a new set of objectives, a new priority, a new set
of measures, so you never have a stable baseline over time with which to see whether you re
succeeding or failing. So in order to establish measures and objectives at the capability
level, we have developed a technique and a deliverable called the capability strategy
sheet, and in order to help us create capability strategy sheets we ve created a framework
of user tasks. We have mined ten years worth of user research that we ve been doing, client
interviews and mental modeling sessions and contextual inquiry and things like that, and
we ve identified about 90 different tasks that represent the entire lifecycle of a user
s interaction with Vanguard and we group them into these eight categories. We did the same
thing from a business perspective. We did lots of stakeholder interviews and focus groups
and we identified about 45 tasks that the business wants users to do when we group them
into these seven categories. In the middle we have our multi-channel experiences split
out into capabilities and across channels. We have a lot of capabilities because we ve
got a very extensive multi-channel experience, 600 capabilities and the list is still growing.
Now the capability strategy sheets, this process that we ve created and this deliverable is
so key to our measurement process, I d like to briefly describe how to create one for
you, and I d love to use the Vanguard example but then I d have to kill you all, so instead
we re going to use an example from a fictional pet store that I ve created which hopefully
doesn t kill its parrots. This particular pet store has a website and the website has
a capability called an item profile. It s the product details page for something that
the pet store is selling. So how we typically create a sheet, we get five or six people
from the design and business teams in a room and we each give them an individual copy of
our task framework and we say, Hey, spend ten minutes and just circle anything that
you think this capability has to satisfy, all of the tasks from both the business and
user driven sides. Then after ten minutes we come back together and we have a great
collaborative discussion and reach consensus on all of the tasks that we think we have
to satisfy in this particular capability. Then once we ve done that we print them on
to stickies and we ve geeked out a little bit. We ve laminated them and put them on
magnets so that we can reuse them on white boards, but we put them on white boards and
we group them and order them into priority order and we rank them 1 through N, and then
for each of those tasks we talk about and think through and document the emotional considerations
that might be in play in that task, a high level approach to solving that task, and most
importantly for today s discussion the success metrics and criteria that will tell us whether
or not we are succeeding or failing in that task in this capability. This is what a finished
capability sheet looks like from our format and our perspective. The first column if I
can find the laser pointer on here, the first column is the tasks in priority order, the
second column is the emotions, the third column is the high level approach, the fourth column
are the success criteria, and this fifth column is just a general notes area. Now you probably
can t read this from the back and that s okay. I did bring paper copies of this if you wanna
take one at the end. I didn t hand them out ahead of time because otherwise you d read
that and not listen to what I m saying, but they re out on the table, the registration
table, out there. If you d like to take one at the end of the day please feel free. So
our project teams find these sheets useful for many different reasons. The one that I
m focused on today obviously is the measurement aspect, so let s dig into measurement a little
bit. We break measurement into two discreet pieces, measures and success criteria. A measure
is simply that, something which can be measured, a data point that you can plot against a scale,
so time, user clicks, amount of client feedback, all examples of measures. A success criteria
is a target which will indicate whether or not you are succeeding or failing. So 20,000
user clicks, 30 seconds, 10 percent negative feedback. s drill first into measures. We
saw from our example, our capability sheet from our pet store that any given capability
is trying to satisfy multiple client tasks from a client driven and business driven perspective,
but very typically only one or two of those tasks will represent the true desired outcome
of that capability, the thing that users really want to do and that the business really wants
users to do. In our example that s represented by the buy the item task. Users want to buy
the item and the business wants users to buy the item. This we call the outcome task or
the outcome measure. Measuring this can tell you if your capability is being successful
or more commonly if it is failing. These measures are typically input into the ROI measures
that I promised we wouldn t talk about. They are typically fairly high level. They re also
called a phrase that I ve heard here a couple of times already yesterday and today, key
process indicators or KPI s. There s even an online KPI library that details some common
outcome measures. But while these measures can tell you if your capability is being successful
they cannot tell you why. To understand why your capability is failing you need to look
at all of the other tasks and measures in your sheet. These we call drivers because
they drive towards or contribute towards the desired outcome. This particular type of diagram
is typically called a fishbone diagram and really helps you to understand this cause
and effect relationship between your drivers and your outcomes. Now as we ve been writing
capability sheets and defining measures over the past year, two years, we have started
to see some repeating measures or patterns. Forgive my accent. I can t say that word properly.
Hold on. Patterns, patterns of behavior and measures, and I thought it would be useful
to share some of those with you just in case they translate to your own experiences. So
the first one that we see is quite simply when we re trying to evaluate whether somebody
can complete task. If the task is very transactional in nature or action oriented and has a well-defined
start and end point this can be as simple as a conversion rate, number of people that
started versus number of people that finished. However if the task is less well defined,
perhaps it s about the user understanding something or getting a piece of information
it becomes much more difficult to measure. We ve started to experiment with using future
user behavior as an indicator of task success, so let me give you an example. s say from
a Vanguard perspective clients have the task of understanding or wanting to understand
the role that costs play when picking investments and let s say our solution to that was to
give them a 60-second video they could watch which explained it. Well if the client behavior
after watching the video starts to change, i.e. they start to look at lists of funds
now by cost or they start to buy lower cost funds then we can hypothesize that our video
was successful at helping them to complete that task of understanding the role that costs
play. We also have situations in our experience at Vanguard where clients know exactly what
they want to do when they come to a page or a capability. This occurs most frequently
on our homepage where clients just want to log on. Sometimes they ll read a news article
first before logging on, but in the case where clients come and the only thing they do is
log on it s a good bet that that s really all they wanted to do. If we see clients taking
what we feel is too long a time to log on, 30 seconds to log on versus the 5 or 10 seconds
we feel it should take, especially clients who perhaps aren t as frequent clients that
log on so they re not as familiar with the site, then we can hypothesize that well maybe
we hid the logon fields too much. Maybe they re not prominent enough. We also measure fairly
simple things like do our links meet expectations, so if we see people bouncing back from clicking
links straight away then we can start to hypothesize, well, perhaps that means the link was poor.
The wording was poor. Or the broader context that the link was presented in was misleading
the client into what they would get when they clicked it. We do try and stay away from kind
of client driven or client reported behavior as much as possible. We really prefer behavioral
data or web metrics and usage data and things like that, but in some cases it is unfortunately
unavoidable especially when you re trying to measure something like client satisfaction.
So in some cases we do ask clients was this useful to you? Was this helpful to you? We
do try and get a bit more specific in some cases and we ask them why they didn t do a
certain behavior, so if they didn t buy the item we can say, Well why didn t you buy the
item? and survey data like that can lead us towards understanding whether or not people
are being satisfied by a certain task. We also have situations where some of our clients
want summary data and some of our clients wants lots of detail, and so we provide that
in our experience. We provide summary and then a click for more details. Well if everybody
is clicking into the details that can be an indication that we re not providing enough
information in the summary. Conversely if everybody is staying at the summary perhaps
we re providing too much information in the summary and overwhelming people. This also
works across channels. So perhaps you ve got some people on your website who then call
for more information. Well that can be an indication that you re not providing enough
information in your web capability. Finally, our last pattern that we re seeing is let
s say we ve provided a link from page A to page B and people are going from page A to
page B but they re not using that link. They re going from page A back to the homepage
and then to page B or they re using global navigation rather than using the end page
link. Well we can start to hypothesize what s wrong with that link? Is it not prominent
enough? Again is it not worded correctly? So we can use these kind of relative paths
of navigation to the same destination to help us understand and predict whether or not people
are using our experience in the way that we expect. Now these six types of measure are
just six common ones that we ve been seeing. We have many, many more that are much more
specific and unique to the circumstances that they re in, the task and the capability that
they re measuring, and we actually got into a great conversation at our lunch table today
about this and I ended up almost giving this presentation to the poor, unsuspecting three
people or four people that were sitting at the table, but I started to realize and say
that the lower level you get with your measures especially around driver measures, the more
granular you get the less you can translate measures. That s why the outcome measures
that I mentioned, these key process indicators or KPI s you can have a library of those because
they re conversion rates and things like that. They are fairly transferable because they
re fairly high level. These guys of measures on drivers however, they get very granular
very quickly. They re very particular to your own experience. So while patterns like this
may be useful in a general sense, don t expect that you can just go back to the office and
say, Oh yes, we ll use this one and apply it to one of your tasks and your capabilities.
You need to think through those, and when you re thinking through those, when we re
thinking through new measures as well we ask ourselves two key questions. One, we say well
if we really satisfy this task well with this capability, if we knock it out of the park,
what will the user behavior be and can we measure it, and secondly and more usefully
in a lot of cases, if we really screw it up, if we really don t solve this task well even
to the extent of not solving it at all, what will the user behavior be and can we measure
that? So we ve taken a look at some measures, things you can measure, but how do you know
if they re good or not? How do you know if that represents success or failure? So that
s where we get into success criteria. We see two types of success criteria, one which we
ve termed enduring, which we typically use for kind of ongoing monitoring of the health
of our experience and our capabilities and one which we ve called temporary which we
typically use for kind of point in time improvements when we re focused on making an improvement
in the capability and we want to get some extra data to help us make that decision.
Enduring measures typically take the form of measuring your production experience against
a preset threshold or criteria. Think of these criteria as flags that you want to be raised
when something unexpected is happening that you want to dig into. You can set these criteria
by looking at past user behavior or by future expectations. What I mean by past behavior
is you can say, well, within this capability we ve been satisfying this task over the past
12 months at about 50 percent on average, and so we re going to use 50 percent going
forward because if ever that task dips beneath 50 percent we want the flag to go off and
we wanna know about it because there s something wrong and we wanna fix it. Or you can use
future expectations. You can say, well, we designed this experience and we expect everybody
going from page A to page B to use this link, to go this way, and if lots of people are
not doing that, let s say less than 80 percent, then you want the flag to be raised because
now your experience is not behaving the way you expected it to when you designed it, an
indication that perhaps there s something wrong. So in our example capability sheet
we have this task of get information about the item and we typically satisfy this with
summary information and details and pictures and photographs of the product and we can
say, well, what would be the worst thing that would happen if we really screwed that task
up with this information? Well the users wouldn t buy the item. So we can say all right, well
let s survey everybody not buying the item, a loss of sales survey, and ask them why didn
t they buy the item and we can say well we measured that over 12 months, and at no time
in the previous 12 months did it spike about 5 percent of people saying the information
was at fault for them not purchasing the item, and so we can use that 5 percent going forward,
and if it ever spikes above that then something has happened, something is wrong, we unexpectedly
broke something and we wanna go and fix it. Temporary measures tend to be much more comparative
in nature. These take the form typically when you have multiple solutions that you d like
to use to solve a task and you want to AB or multi-variant test them against one another
in order to get some data to help you make a decision. Notice I said to help you make
a decision. I ll get back to that in a little bit. This gives you point in time winners.
You just need to show statistical significance, which I m not gonna get into right now. So
we have a task in our previous capability example for printing details about the item
and we can say, well, we could solve that via a link or a button or we could put at
the top or the bottom or the left or the right or all these different solutions and we can
AB test them against one another to find out which one most successfully completes the
task or helps the user complete the task. So these are some of the tools and techniques
that we have been using to establish objectives and measures and success criteria for our
experience and measure our experience over time to see if we re succeeding or failing,
and we ve used some of these techniques both in production and we ve also used some of
these quantitative techniques in what we have an experimentation region where we re able
to put lots of client data and client information and get clients in there and they see the
designs that we are creating with their own data at scale and we can use that quantitatively,
and so we ve used some of these measures especially the temporary ones in that experience too.
So what have been some of the cultural challenges that we have experienced as we have been doing
this type of work? The first one is that when you start using data in this way to make decisions
or to help you inform decisions and when you start measuring your user experience like
this it can get a little uncomfortable especially for some people because using data in this
way and measures in this way is a little alien to some people involved in the user experience
field. We ve found that being very open about the data and what we re doing and the measures
helps us here. We talk openly about who we re measuring, what we re measuring, why we
re measuring it, what we re going to do with those measures, and probably most usefully
the limitations of the measures and what they can and cannot do, and that helps people feel
a little bit more comfortable. Also we re very inclusive. We bring people into the process
of defining measures very early. We give them input into it. We take their opinions and
we let them help drive what we should be gathering because then they have much more buy in to
the results and the data that comes out of that. The second thing is don t be afraid
of gathering valid data. You do not want a reputation for only gathering data and putting
measures on things that support your particular opinion of what the design should be. You
lose a lot of credibility if you do that, so be very neutral and just gather the data
if it s valid. Conversely you should be very afraid of invalid data, and Dilbert says this
better than I do, so for the benefit of the people in the back of the room I ll read it
for you. Dilbert is saying, d like to thank all of the people who helped design the technology
test parameters. Thanks to your input the test had nothing in common with how things
work in the real world, so I wasted two weeks of my life on a test that is not only meaningless
but dangerously misleading. This slide shows the gap between the test results and reality
, and one of his mangers is saying, ll use the test results anyway because it s the only
data we have , and Dilbert is saying, Fine, I hope you all choke to death on your lunches
, and manager is saying, Why is he so cranky? and colleague, Something about data. This
temptation to use invalid data or misleading or dangerously misleading data is a strong
one especially if you ve actually gathered the data, but it also hurts your credibility
in the long run because at some point it becomes apparent that the measurement and the data
you used to make a decision was wrong, and then all of your data has credibility issues,
so try and avoid this whenever possible. You need to keep your perspectives as well, and
more importantly help your sponsors and stakeholders to keep their perspectives about how these
measurements and this data fits into your overall decision-making process. It has to
be balanced with the qualitative data. You have to use good judgment. You cannot simply
do whatever the data tells you. You have brand considerations. You have emotional considerations,
which are more difficult to measure. You have esthetic considerations, which are very, very
difficult to measure. You have to use your good judgment. It s just one more data point.
We like to think of it as data informed design, not data driven design. Finally, you don t
need a massive corporate-wide initiative to start doing this. It does become quite addictive
if you start small. Pick a capability, define your objectives, define a couple of measures
and success criteria, and start measuring. Hold the stick in one position and watch what
your experience does. Thank you. [End of Audio] MX 2011 _ Richard Dalton Page PAGE of NUMPAGES
A Practical Guide to Measuring User Experience Brandon Schauer, Richard Dalton www.verbalink.com
Page PAGE of NUMPAGES [Content_Types].xml u$Nw @8Jb _rels/.rels theme/theme/themeManager.xml
62Hp !Uq7 GvxY @JyBw>6K 0i1; P0!Y ?sE/ ?J&l p)af theme/theme/_rels/themeManager.xml.rels
theme/theme/_rels/themeManager.xml.relsPK
accent2="accent2" accent3="accent3" accent4="accent4" accent5="accent5" accent6="accent6" hlink="hlink"
Verbal Ink Normal.dotm Chelsa Robinson Microsoft Macintosh Word MX 2011 _ Richard Dalton _
A Practical Guide to Measuring User Experience Title _PID_LINKBASE Microsoft Word 97-2004
Document NB6W Word.Document.8