Satisfaction score 4.67 is pretty good, right? RIGHT?

by Finn Patraic

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Evaluation of stars or satisfaction score: what does the number mean?

Fictitious story. Imagine that you manage a transport service for people with disabilities, so that they can go and come back from the places they need to live a significant life. It is a service shared with limited seats and a limited pool of drivers. You even have an application for users to plan and monitor the rides. You are focused on the data, so after each trip, the application encourages users to assess their drivers on a stars' rating scale according to their satisfaction score. You build a dashboard and watch this over time. The average amounts to 4.67. You initially set a global target of at least 4.3 and 4.6 as a stretching target. You beat your extensible goal! Yay. Simple: everything works well because the 4.67 satisfaction score is good enough, right? RIGHT?

It depends

Well, “the devil is in detail”, as some say … humans are complex. Two people can examine the same question, the same context, the same everything, while reaching a different interpretation. Not to mention artificial intelligence (AI). The ingredients are all there. However, something is disabled …

Skills for your future. Online courses from $14.99." target="_blank" rel="sponsored noopener nofollow"> Udemy – Top courses 80 % off

So, is the satisfaction score of 4.67 good? Those who work with me on data (especially surveys and evaluations) probably already mean the answer:

It depends on how you interpret the result and what you plan to do about it.

If you do not plan to take measures, this is a very good result. But then why do we collect data in the first place?

What does note 4.67 stars mean?

Hopefully you do not plan actions according to a single metric (not to mention an average that you have created by magic from stars), but suppose that this number means a lot for your organization. Let us first look at the advantages of the unique question approach:

Skills for your future. Online courses from $14.99." target="_blank" rel="sponsored noopener nofollow"> Udemy – Top courses 80 % off
  1. You care.
    You show that you care about customers and make sure that all drivers behave according to the strict standards you establish.
  2. You collect data.
    Your data collection is scalable, consistent and “reliable” as long as the application works.
  3. You do not overwhelm customers with long surveys.
    Unique question. Always the same, always at the end, at the same time, just after an end. Coherence is the key.
  4. You watch your data.
    Not only as a unique but trendy metric over time. Good start!
  5. You segment your data.
    By vehicle, by road, by driver, etc., and you have a proactive plan to act immediately if something happens. Fortunately you have a data strategy.
  6. You plan to make decisions and act on the results.
    You do not know how many dashboards die in the long term without any significant decision taken on them.

What can go wrong with this approach?

Oh, the details … before going into details, let's start with an experience. Wherever you are, read this article, right now: say the word “good” aloud. Simply say the word. I hope you have not expressed some concern. Now imagine the following scenarios where the answer is the same word, “very”. You don't need to say it aloud unless you really want to entertain people around you.

a) Bored mother scenario
After three missed calls from your mother, you finally pick up the phone just to tell her that you are busy when she asks: “How are you?” – GOOD.

b) Scenario for the manager's fear
Your manager asks you to come to their office (or a quick virtual call to one by one) unexpectedly and ask the question in advance: “How are you?” – GOOD (?)

c) Your call is important to us scenario
After 3 transfers and 45 minutes pending with customer service, the fourth agent in the department finally meets the call. With resourceful enthusiasm, the agent opens the Convo: “How are you?” – GOOD!

Context and perception of matter

What does this experience have to do with satisfaction surveys? The context and perception count! Who asks you the question, when they ask you the question, how they ask you the question, how often they ask you the question … All the details count.

Maybe your answer is the same, but what you hear by that cannot. When you are in a direct conversation with someone, he can read your tone, your body language, etc. But sending an investigation question is different. You lose the context. Are you sure you measure what you measure? Are you sure your data is reliable? Are you sure your “ideas” are correct? Bamm is a lot to consider!

In my data literacy workshops, I collectively refer to these potential problems as Bamm (biases, hypotheses, myths and false ideas).
Here are some details on what can go from start to finish when you get Bamm'Ed:

  1. Lack of context
    You have a program and a goal in mind. However, it would take too long to explain the context, so you simply summarize it in a question. The whole context remains in your head. On paper, this is a single sentence, for interpretation.
  2. Selection bias
    You must decide your audience. Everyone? Every time? Sample? Anonymous, pseudo-anonymous, monitoring of user IDs? This provides data confidentiality and data security in the mix.
  3. False ideas and erroneous errors
    You must then decide the exact words you use. Each. Bachelor. Word. Imported. (Have you ever tried to obtain a consensus on a simple question of investigation into marketing, legal, product, HR, etc.)
  4. Data classification False ideas
    You must decide the type of data you collect. The type of question you are going to ask will determine the type of data (do not enter into data classification here, but you should). True or false? Likert scale? Cursor? SELECT Single? Multi-selection? Matrix? Open text? Combination?
  5. Timing of the investigation
    Finally, you land on a question and the type. Who will have this question? When? How?
  6. Validity problems
    In our history, they decide to include the question in the application, just after the end of a trip, focusing on the pilot. The data can be valid for one but not for another. For example, it is good to use disk letters to have a conversation on preferences, but it should not be used to pigeon people in jobs.
  7. Interpretation and context
    The customer receives the question. Remember the “beautiful” experience? The context in which the customer answers the question is important, but you will not know because everything you get is the number of stars. Stars can capture emotions unrelated to what you really ask.
  8. Bias
    Conscious and unconscious factors can interfere in the way customers react. For example, road rages are often impulsive reactions to past experiences.
  9. Busy questions
    Each. Word. Imported. For busy questions, you get loaded answers. For example, formulate the question with positive words such as “tell us about the quality of our customer service representative …” can influence the answer.
  10. Ambiguity
    What is a star against two starts? The customer selects the number of stars. In your mind, there is a context associated with each star. One is a showstopper and requires immediate intervention. Five is a great experience. Well, once again, it's in your mind. I know people who never give one or five. They reserve it for extreme events.
  11. Data handling
    You receive the data. However, we are no longer talking about stars. You transform the five -star notes into numbers, assuming a smooth scale from one to five. Is it really the same to go from three to four to go from four to five? Technically, you have just introduced a rounding error. If you process your data as a continuous continuous beach but you do not let customers select a number, you round their results in whole numbers.
  12. Using rounded values
    You calculate the average. The rounding is good, but you must be careful using rounded values ​​for other calculations. Basically, you force customers to select an integer, but then claim that the second figures are important on average? In addition, will it be the villain? Median? Are you going to look at the distribution? Aberrants? Form of your data? Or just the simple and unique number.

And the list could continue …

What other biases could interfere?

Your application appears from the question of the satisfaction score at the end of the journey. This can potentially lead to a survival bias, as you will only get comments when there has been a trip. What about cancellations? Wouldn't you like to know how satisfied your customers are when they had to cancel a trip?

Generally, people tend to submit more positive responses in satisfaction scores than they actually think. This can be a combination of factors. Social expectations, wanting to maintain the service because there is no alternative, by selecting the answer they think to wait for what they feel, etc. If you have several questions, the order of questions can intervene. The first answer can “anchor” the rest. The order of options can also be problematic. There are ways to alleviate biases, but only if you are aware of their potential existence and you have a plan in advance.

How could you improve your question to mitigate biases?

An approach is to provide conditional open text when the answer is not preferred. If you do this with a single question, it can help customers develop their selection, just make sure it is optional. You now have quantitative and qualitative data with which to work with. It's more nuanced.

But, if you have several questions using the same method in an investigation, it can be perceived as potentially boring because it extends the time for the investigation. People do not already like surveys, so when they perceive you “cheat” on the length, it can become ugly.

Final word on 4.67

Back to our history. The interpretation of 4.67 as the overall satisfaction score with the journey can be misleading. Always be sure to measure what you intended to measure, and it provides usable information in the purposes for which it was created. If you ask questions about the pilot, the data concerns the pilot and not on the reader himself. Personally, for apprenticeship surveys, I found that using the approach of Will Thalheimer can provide more usable and significant data attenuating many of these factors mentioned above (1).

References:

(1) Surveys on learner and learning efficiency with Will Thalheimer

Originally published at

www.linkedin.com

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.