Evaluation of stars or satisfaction score: what does the number mean?
Fictitious story. Imagine that you manage a transport service for people with disabilities, so that they can go and come back from the places they need to live a significant life. It is a service shared with limited seats and a limited pool of drivers. You even have an application for users to plan and monitor the rides. You are focused on the data, so after each trip, the application encourages users to assess their drivers on a stars' rating scale according to their satisfaction score. You build a dashboard and watch this over time. The average amounts to 4.67. You initially set a global target of at least 4.3 and 4.6 as a stretching target. You beat your extensible goal! Yay. Simple: everything works well because the 4.67 satisfaction score is good enough, right? RIGHT?
It depends
Well, “the devil is in detail”, as some say … humans are complex. Two people can examine the same question, the same context, the same everything, while reaching a different interpretation. Not to mention artificial intelligence (AI). The ingredients are all there. However, something is disabled …
So, is the satisfaction score of 4.67 good? Those who work with me on data (especially surveys and evaluations) probably already mean the answer:
It depends on how you interpret the result and what you plan to do about it.
If you do not plan to take measures, this is a very good result. But then why do we collect data in the first place?
What does note 4.67 stars mean?
Hopefully you do not plan actions according to a single metric (not to mention an average that you have created by magic from stars), but suppose that this number means a lot for your organization. Let us first look at the advantages of the unique question approach:
- You care.
You show that you care about customers and make sure that all drivers behave according to the strict standards you establish. - You collect data.
Your data collection is scalable, consistent and “reliable” as long as the application works. - You do not overwhelm customers with long surveys.
Unique question. Always the same, always at the end, at the same time, just after an end. Coherence is the key. - You watch your data.
Not only as a unique but trendy metric over time. Good start! - You segment your data.
By vehicle, by road, by driver, etc., and you have a proactive plan to act immediately if something happens. Fortunately you have a data strategy. - You plan to make decisions and act on the results.
You do not know how many dashboards die in the long term without any significant decision taken on them.
What can go wrong with this approach?
Oh, the details … before going into details, let's start with an experience. Wherever you are, read this article, right now: say the word “good” aloud. Simply say the word. I hope you have not expressed some concern. Now imagine the following scenarios where the answer is the same word, “very”. You don't need to say it aloud unless you really want to entertain people around you.
a) Bored mother scenario
After three missed calls from your mother, you finally pick up the phone just to tell her that you are busy when she asks: “How are you?” – GOOD.
b) Scenario for the manager's fear
Your manager asks you to come to their office (or a quick virtual call to one by one) unexpectedly and ask the question in advance: “How are you?” – GOOD (?)
c) Your call is important to us scenario
After 3 transfers and 45 minutes pending with customer service, the fourth agent in the department finally meets the call. With resourceful enthusiasm, the agent opens the Convo: “How are you?” – GOOD!
Context and perception of matter
What does this experience have to do with satisfaction surveys? The context and perception count! Who asks you the question, when they ask you the question, how they ask you the question, how often they ask you the question … All the details count.
Maybe your answer is the same, but what you hear by that cannot. When you are in a direct conversation with someone, he can read your tone, your body language, etc. But sending an investigation question is different. You lose the context. Are you sure you measure what you measure? Are you sure your data is reliable? Are you sure your “ideas” are correct? Bamm is a lot to consider!
In my data literacy workshops, I collectively refer to these potential problems as Bamm (biases, hypotheses, myths and false ideas).
Here are some details on what can go from start to finish when you get Bamm'Ed:
- Lack of context
You have a program and a goal in mind. However, it would take too long to explain the context, so you simply summarize it in a question. The whole context remains in your head. On paper, this is a single sentence, for interpretation. - Selection bias
You must decide your audience. Everyone? Every time? Sample? Anonymous, pseudo-anonymous, monitoring of user IDs? This provides data confidentiality and data security in the mix. - False ideas and erroneous errors
You must then decide the exact words you use. Each. Bachelor. Word. Imported. (Have you ever tried to obtain a consensus on a simple question of investigation into marketing, legal, product, HR, etc.) - Data classification False ideas
You must decide the type of data you collect. The type of question you are going to ask will determine the type of data (do not enter into data classification here, but you should). True or false? Likert scale? Cursor? SELECT Single? Multi-selection? Matrix? Open text? Combination? - Timing of the investigation
Finally, you land on a question and the type. Who will have this question? When? How? - Validity problems
In our history, they decide to include the question in the application, just after the end of a trip, focusing on the pilot. The data can be valid for one but not for another. For example, it is good to use disk letters to have a conversation on preferences, but it should not be used to pigeon people in jobs. - Interpretation and context
The customer receives the question. Remember the “beautiful” experience? The context in which the customer answers the question is important, but you will not know because everything you get is the number of stars. Stars can capture emotions unrelated to what you really ask. - Bias
Conscious and unconscious factors can interfere in the way customers react. For example, road rages are often impulsive reactions to past experiences. - Busy questions
Each. Word. Imported. For busy questions, you get loaded answers. For example, formulate the question with positive words such as “tell us about the quality of our customer service representative …” can influence the answer. - Ambiguity
What is a star against two starts? The customer selects the number of stars. In your mind, there is a context associated with each star. One is a showstopper and requires immediate intervention. Five is a great experience. Well, once again, it's in your mind. I know people who never give one or five. They reserve it for extreme events. - Data handling
You receive the data. However, we are no longer talking about stars. You transform the five -star notes into numbers, assuming a smooth scale from one to five. Is it really the same to go from three to four to go from four to five? Technically, you have just introduced a rounding error. If you process your data as a continuous continuous beach but you do not let customers select a number, you round their results in whole numbers. - Using rounded values
You calculate the average. The rounding is good, but you must be careful using rounded values ​​for other calculations. Basically, you force customers to select an integer, but then claim that the second figures are important on average? In addition, will it be the villain? Median? Are you going to look at the distribution? Aberrants? Form of your data? Or just the simple and unique number.
And the list could continue …
What other biases could interfere?
Your application appears from the question of the satisfaction score at the end of the journey. This can potentially lead to a survival bias, as you will only get comments when there has been a trip. What about cancellations? Wouldn't you like to know how satisfied your customers are when they had to cancel a trip?
Generally, people tend to submit more positive responses in satisfaction scores than they actually think. This can be a combination of factors. Social expectations, wanting to maintain the service because there is no alternative, by selecting the answer they think to wait for what they feel, etc. If you have several questions, the order of questions can intervene. The first answer can “anchor” the rest. The order of options can also be problematic. There are ways to alleviate biases, but only if you are aware of their potential existence and you have a plan in advance.
How could you improve your question to mitigate biases?
An approach is to provide conditional open text when the answer is not preferred. If you do this with a single question, it can help customers develop their selection, just make sure it is optional. You now have quantitative and qualitative data with which to work with. It's more nuanced.
But, if you have several questions using the same method in an investigation, it can be perceived as potentially boring because it extends the time for the investigation. People do not already like surveys, so when they perceive you “cheat” on the length, it can become ugly.
Final word on 4.67
Back to our history. The interpretation of 4.67 as the overall satisfaction score with the journey can be misleading. Always be sure to measure what you intended to measure, and it provides usable information in the purposes for which it was created. If you ask questions about the pilot, the data concerns the pilot and not on the reader himself. Personally, for apprenticeship surveys, I found that using the approach of Will Thalheimer can provide more usable and significant data attenuating many of these factors mentioned above (1).
References:
(1) Surveys on learner and learning efficiency with Will Thalheimer
Originally published at
www.linkedin.com

At Learnopoly, Finn has championed a mission to deliver unbiased, in-depth reviews of online courses that empower learners to make well-informed decisions. With over a decade of experience in financial services, he has honed his expertise in strategic partnerships and business development, cultivating both a sharp analytical perspective and a collaborative spirit. A lifelong learner, Finn’s commitment to creating a trusted guide for online education was ignited by a frustrating encounter with biased course reviews.