Why Chatbots Need Quality Standards?

According to a global study in 2017, over 32% of executives have already implemented artificial intelligence in customer service applications within their companies, and 62% believe that conversational interfaces for customer support will be the most transformative innovation in the next five years. It’s no wonder that businesses are so enthusiastic. Widespread adoption of AI and chatbots offer a range of benefits to companies, including:

  • Providing customer service and support 24/7. Customers increasingly expect a fast, accurate responses to their inquiry any time of day, through every communication channel. They expect consistent service, support, and response whether they contact a company through email, chat, phone, apps, or social media.
  • Giving customers the live chat features they prefer. Consumers increasingly prefer to contact businesses through live chat. 44% of consumers prefer live chat, finding it faster and more convenient, and 1 in 10 millennials actually expressed a preference for a bot over a person for shopping inquiries.
  • Increasing sales and revenue. According to David Marcus, Facebook’s VP of messaging products, companies are seeing significant increases in sales and revenue through deploying chatbots on Facebook Messenger.
  • Improving customer service staff efficiency. For many companies, using bots to manage simple inquiries and support requests creates more time for human staff to manage more complex issues. Combining AI with human intelligence provides better service for both kinds of customer contacts.
  • Reducing staff salary expenditures. Chatbots can reduce the cost of quality customer service by as much as 30%, by saving staff time and reducing cost per inquiry.

With all these benefits, it’s no wonder that we are seeing an explosion of chatbots deployed in customer service applications, with the number expected to continue to grow exponentially into the future. However, this widespread adoption isn’t without problems. Many companies and consumers still doubt the ability of AI to deliver high-quality, accurate customer service and support.

In fact, in a recent survey, 53% of consumers who had used chatbot customer service found it to be not effective, or only somewhat effective. 14% of US consumers and 5% of UK consumers rated their chatbot experience as not effective at all.

Even worse, a study by Warwick Analytics discovered that 59% of businesses who use a chatbot are unhappy with its performance. Fully 93% of those businesses believe that human validation is necessary for chatbot interactions, and 21% of businesses refuse to adopt chatbots due to performance concerns.

In other words, consumers and businesses want chatbots, and are adopting them eagerly, but so far they aren’t delivering all the hoped-for results, for either customers or businesses.

What’s missing is a simple, effective, standardized way to rate and improve chatbots. For too long, they were judged by their ability to pass the Turing Test, and not by measurable performance standards.

Chatbot Standards and Attributes

For this reason, a recent academic study by Nicole Radziwill and Morgan Benton is particularly interesting. The authors have evaluated the literature regarding conversational agents, going back to the 1990s, and compiled a set of quality attributes for conversational agents. Their findings compile attributes into simple categories that relate to the ISO 9241 standards of usability: effectiveness, efficiency, and satisfaction.

According to Radziwill and Benton, a thorough survey of the existing research and literature reveals surprising correlation and consensus about chatbot behavior. Ultimately, they find that chatbots should possess the following attributes:

  • Efficiency:
    • Graceful degradation
    • Robustness to manipulation and unexpected inputs
    • Effective function allocation with appropriate escalation
  • Effectiveness:
    • Accurate speech synthesis
    • Accurate interpretation of commands
    • Appropriate formality and linguistic register
    • Facilitate transactions
    • On-the-fly problem solving
    • Breadth of knowledge
    • Flexibility of interpretation
    • Transparent and discloses its chatbot identity
    • Able to maintain themed discussion
  • Satisfaction:
    • Convey personality
    • Provide conversational cues
    • Exude authenticity and warmth
    • Read and respond appropriately to human moods
    • Respect, and preservation of privacy and dignity
    • Cultural and ethical knowledge of users
    • Awareness of social context
    • Can detect meaning or intent

When a chatbot displays these attributes, it is able to build trust with a human and provide a satisfying interaction. In that context, it is not necessary that they pass the Turing Test. In fact, many researchers argue that it is more desirable for the bot to disclose its identity, and perhaps even to include some errors to increase rapport.

The Turing Test and Chatbot Ethics

As you might expect, the discussion of realism is an important one. Customers don’t mind chatting with a bot, and can even respond warmly to bots, but they do not want to be deceived by one. Ethical and legal scholars are raising questions about whether use of chatbots should require disclosure, which seems particularly relevant as these bots are being deployed into medical and financial services industries, where people are more sensitive than they might be in typical customer service and support applications.

Current objections to disclosure relate more to questions of platform and user experience than to ethics; in the case of chatbots on Facebook Messenger, does the platform need to disclose? The business? In each interaction? Bots exchange more than 2 billion messages on the platform, yet it is entirely unclear whether consumers understand the nature of these interactions, or who is responsible for disclosure.

It is reasonable to expect that, in the coming years, governments and organizations will develop guidelines for disclosure and ethics in conversational agents. The widespread attitude that ”people should assume they are talking to a bot” is simply insufficient.

Applying Quality Metrics and Measuring Chatbot Performance

Debates about realism and the Turing Test aside, all the other attributes listed above have near-unanimous agreement among researchers. The significance of the quality attributes as synthesized by Radziwill and Benton is that it creates a consensus, and therefore a tool, by which chatbot performance can be measured. They suggest a uniform Analytic Hierarchy Process (AHP), in which chatbots can be scored and ranked by the above attributes. Their study includes a basic AHP model for evaluating chatbots, with the goal being to select between an older and newer model. They suggest priority rankings for chatbot attributes, which could be modified based on individual goals and objectives. Using the Radziwill and Benton model, attributes are ranked in order of importance, and then chatbots are tested and scored by those criteria. For example, a chatbot can be scored in its ability to correctly escalate problems, its ability to respond correctly to questions, its ability to detect social cues, etc. With AHP, you would then analyze both the competing attribute scores, and the priority of the attribute overall, to determine which chatbot is the most desirable for your needs.

Such a ranking and scoring system provides a clear, consistent method for evaluating the quality of a chatbot. With a standardized model, chatbot users, owners, and coders can consistently evaluate differences between chatbots, or between updated versions of the same bot as it evolves over time.

Radziwill and Benton suggest using their findings in two ways:

  1. For service providers and programmers, the list of attributes can be used as a checklist to identify the necessary abilities and behaviors of a conversational agent.
  2. For both providers and implementers, the AHP model provides a means for tracking, evaluating, and comparing conversational agent performance.

There’s no question that chatbots are transforming the customer service experience in every way, at a rate that will only increase in the coming years. The potential rewards, for customers and for businesses, are simply too great to be ignored. However, significant barriers still exist.

Building Trust

As we have seen, 59% of businesses who use a chatbot are unhappy with its performance, and 21% refuse to use one due to performance concerns.

Furthermore, while customers expect and demand service 24/7, and 68% are willing to talk with chatbots, 53% of them are unhappy with the interactions. As chatbots increase in sophistication, there is a potential for a customer to not know the difference, and feel deceived or betrayed by these interactions. Customers want authentic brand experiences that, so far, conversational agents are not delivering.

It all comes down to trust. Customer service is the most important aspect of a business, and businesses need to believe that they are delivering the highest level of service at all times, even without human validation. Customers need to believe that the are receiving efficient, effective, accurate information from chatbots. And both businesses and customers want brand experiences; to have interactions that feel intelligent and informed, and also personal and unique.

The framework proposed by Radziwill and Benton could be integral in this process of building trust. With an agreed-upon list of necessary capabilities and attributes for a chatbot, we could then have a meaningful way of knowing whether a conversational agent is reliable and performs properly. With a goal-based priority framework and an evaluation tool, each brand could decide which attributes are most important to them, and measure accordingly. It provides both a tool for standardization, and a tool for evaluation and customization.

The coming years will bring us more and better tools, not just for working with chatbots, but for evaluating and comparing them. We will have ethical and legal standards that we can refer to. We have bots that pass the Turing Test, but people will soon have better cultural context for those interactions, and not be bothered by them. Customer service can be, and will be, automated in a way that works best for everyone.