Improve the Accuracy of AI by Combining Two Types of AI

Posted Wednesday, September 20, 2023 by Jesse Breuer in AI, AI in CX

We'd like to call your attention to a seminal paper published in July 2023 that should be required reading for anyone involved in using Artificial Intelligence to improve Customer Experience. The paper is titled: Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc. It is written by Doug Lenat and Gary Marcus. Unfortunately Doug Lenat passed away about the time this paper was published. His long-time collaborator and partner, Gary Marcus, wanted to be sure it was published before Doug's passing. What follows is our summary of this seminal paper, written with the CX executive in mind.

Solving the Problem of AI Hallucinations with Reasoning AI

The responses given by AI Large Language Models, can be very prone to Hallucination--output which is not factually accurate. This is because Generative AI responses, based on Large Language Models are designed to be plausible, but may lack the ability to determine what is correct. For example, on an episode of 60 minutes which explored the capabilities of Google's BARD, BARD wrote an essay on inflation, and recommended 5 books on the subject. Later fact checking found that those books do not exist. In customer experience and customer support, you do not want your AI tools making up answers to questions posed by customers! So, essentially generative AI needs a lot of specific prompting to produce answers that are trustable.

How Can You Guarantee Trustworthy, Factual, and Useful Answers from Generative AI?

You really can't. But is there a set of criteria that can be universally applied to ensure that output is factual?

The creators of Cyc, a reasoning AI, have devoted 2,000 man-years (40 years times 50 human participants) to solving this quandary. The Cyc project (from an abbreviation of "encyclopedia") has developed 16 criteria for curation of explicit knowledge and rules of thumb. There is a limitation in that a sufficiently complex language causes it to run too slowly. Cyc has developed ways to manage the tradeoff, including a simplified language called CycL, which includes first order logic, and statements about statements, and statements about what the inference engine is trying to do.

The 16 criteria used by the Cyc project, involve knowledge, reasoning and world models, are as follows:

  1. Explanation
  2. Deduction
  3. Induction
  4. Analogy
  5. Abductive Reasoning
  6. Theory of Mind
  7. Quantifier-fluency
  8. Modal-fluency
  9. Defeasability
  10. Pro and Con Arguments
  11. Contexts
  12. Meta-knowledge Meta-Reasoning
  13. Explicitly Ethical
  14. Sufficient Speed
  15. Sufficiently Lingual and Embodied.
  16. Broadly and Deeply Knowledgeable

Cyc has engineering solutions for each of these 16 elements, and many of Cyc's capabilities such as planning, choosing, and learning, involve combinations of several of the above criteria.

The reasoning mechanism is similar to theorem proving in calculus. What gets inferred is what is "default" true, or usually true, in the abscence of other evidence. For example, in order to determine what "it" refers to in the sentence: “The horse was led into the barn while its head was still wet.” requires knowledge that horses have heads, and barns do not. Further there are assumptions that are default true (animals have one head). "Default true" means that there can be exceptions but the statement true unless stated otherwise. Tens of millions of assertions were added into cyc's knowledge base, and billions of new conclusions can be generated from those.

Generalized knowledge, based on world models, is important for assessing truth. For example, for the question "how do you know that Bullwinkle the Moose, and Mickey Mouse are not the same individual?" Cyc needs to know that moose and mouse are different. Rather than know the relation of every living thing to every other living thing, it was given rules of taxonomy, and the rule "for any two taxons, assume they are disjoint if one is not a specialization of the other"

Each nugget of knowledge is thus generalized before being added to the Cyc knowledgebase, to allow for future, less specific, re-use. Cyc seeks to seperate the epistemological problem -- what does the system know? -- from the heuristic problem -- how can it reason efficiently?

Cyc also allows multiple redundant representations for each assertion, and in practice it uses multiple redundant, specialized reasoners -- Heuristic Level (HL) modules --each of which is much faster than general theorem-proving when it applies. There are already thousands of such modules, and in asking human experts in a field who can do something quickly how they accomplish it, new Heuristic models are discovered and added.

Almost the the whole knowledge base of Cyc is comprised of things that are true by default. Cyc can usually find multiple proofs and disproofs of an answer. These become pro and con in reasoning that is based on argumentation rather than proof.

Generative AI by Large Language Models is one pole in the AI architecture space, in which knowledge and reasoning are not explicit, but the conversational aspects and the speed may be superior to that of a semantic reasoning AI such as Cyc.

Cyc is the opposite pole, articulating common sense, world models, and how to represent those in ways that computers can reason over mechanically, and develop reasoning algorithms, which allow reasoning fairly quickly.

How can Cyc and LLMs work together, to provide the factual accuracy of a reasoning AI with the conversational abilities of an LLM? The Cyc team has a series of suggestions:

  1. Use Symbolic systems such as Cyc as a Source of Trust, to reject false confabulations
  2. Use Symbolic systems such as Cyc as a Source of Truth, to bias LLMs towards correctness
  3. Use LLMs as generators of candidate assertions/rules to add to symbolic systems such as Cyc
  4. Use Symbolic systems such as Cyc as a Source of Inference to dramatically extend LLM coverage
  5. Use Symbolic systems such as Cyc as Source of Explanation to provide audits and provenance support

How would a hybrid reasoning and conversational engine be beneficial in customer experience? Good customer experience is largely dependent upon context. What's the situation in which the customer finds themselves? What are they trying to accomplish? What is their role? What have they already tried? It is hard for Chatbots to infer context. They aren't human beings. They don't know how the world works. But a symbolic representation of the world, like Cyc, does. By combining these resources, AI bots would be less likely to make contextual gaffes. Equally (or perhaps more) important, you would be assured of truthful, ethical, and factual answers that are verifiable

Would any of these hybrid approaches shorten the time for AI to be brought up to speed on business logic, for use by customers? Or would they still be more useful as a source of knowledge for human agents?