Solving the problem of how to measure the truth

Jack Harich

Administrator
Staff member
#1
Our applied research project, Politician Truth Ratings, has reached what appears to be the crux of the entire project. How can a claim-check measure the truth of a claim accurately and precisely?

Let's define our terms. According to Wikipedia:

In the fields of science and engineering, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's true value. The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results. Although the two words precision and accuracy can be synonymous in colloquial use, they are deliberately contrasted in the context of the scientific method.

The field of statistics, where the interpretation of measurements plays a central role, prefers to use the terms bias and variability instead of accuracy and precision: bias is the amount of inaccuracy and variability is the amount of imprecision.

A measurement system is considered valid if it is both accurate and precise.
What makes this problem hard is it's a classic problem of philosophy that's never been solved before. What is "truth"? Isn't it like beauty, which is in the eyes of the beholder? Thus isn't truth ultimately a subjective judgement that inherently cannot be measured?

No. The hard sciences have figured out how to accurately and precisely measure countless things that could not be measured before, like weight, distance, and color. The social sciences have done the same with measuring how sleepy a person is, how depressed a person is, and how economically productive an economic system is. We're not trying to define what the truth is philosophically. We're trying to capture how people determine the level of truth of a proposition, so they can make rational decisions based on that knowledge.

What we have here is just one more case of inventing a new form of measurement. Montserrat has found this research area is labeled "instrumentation." Or as Scott Collison said, what we're trying to do is "operationalize the truth."

Let's see if we can pinpoint our knowledge gap. Then we can focus on filling the gap.

We are attempting to measure the truth of a claim in a claim-check. What exactly is the truth here? It's the calculated truth confidence level of the argument's claim.

That truth depends on all the numbers and their relationships used in the calculation. Therefore, if we can improve Structured Argument Analysis so that it can help users accurately and precisely set each of those numbers and relationships, then we have a tool for accurately and precisely measuring the truth of a claim. The tool must support these categories of decisions:
  1. Setting the confidence level of facts.
  2. Setting the confidence level of rules.
  3. Selecting the correct fact or reusable claim.
  4. Selecting the correct rule.
  5. Setting the weights used in rule inputs.
  6. Defining the argument tree relationships.
Our challenge is to figure out how to develop the tool protocols and features that make these decisions easy, fast, accurate, and precise. This can be done by solving one small piece of the challenge at a time, plus continuously improving the tool as we go.

How accurate and precise? Close enough so that the societies using the tool can come reasonably close to the goal of democracy, which is optimizing the long term common good of all.

Over time, what we're trying to do is illustrated in this diagram:

 
Last edited:

Jack Harich

Administrator
Staff member
#2
To get started, which I'm going to try to do is focus on one problem with one claim-check that's already done. This is Attack ad falsely says Bredesen backed gas, sales tax hikes that never happened - Claim-check version.

While this claim-check is done, it's done poorly. I'm not convinced the logic and rule is correct. I'm terribly confused about which "evidence is inconsistent" rule to use. And how can the rule documentation provide the protocol needed to determine when to use the 0%, 5%, or 20% version of the rule? Should we use 5%, like statistics does, or 10%, like the IPCC does? (See page 3.)

The good news is we've isolated a small problem to solve.
 
#3
Jack and I have been comparing the use of the terms accuracy, precision, validity, reliability, bias, and variability.

I thought it would be useful to include in this thread the use of those terms In the academic jargon of social sciences. Here are some definitions by Kellstedt & Whitten (2013).

Reliability
An operational measure of a concept is said to be reliable to the extent that it is repeatable or consistent; that is, applying the same measurement to the same case or observation will produce identical results. An unreliable measure by contrast, would produce inconsistent results for the same observation.
Following this definition, from the diagram Jack shared, both of the targets on the right side (the green one and the yellow one) are reliable.

Bias
Measurement bias [...] is the systematic over-reporting or under-reporting of values of a variable.
Following this definition, on the diagram, only the bottom right target (the yellow one) is biased; it systematically reports values that are off the true target.

Validity
The most important feature of a measure is that it is valid. A valid measure accurately represents the concept that it is supposed to measure, whereas an invalid measure measures something other than what was originally intended.
Following this definition, on the diagram, only the upper right target (the green one) is valid; it systematically reports true values.

Putting everything together
Comparing the definitions, we can conclude, the terms accuracy and validity are equivalent; the terms precision and reliability are also equivalent to each other.
Bias reports the extent to which a measurement is inaccurate or invalid.

Variability reports the extent to which a measurement is imprecise or unreliable.

There is an ongoing debate among academics regarding the connection between validity and reliability. This is Kellstedt & Whitten's position on the matter:
Is it possible to have a valid but unreliable measure? And is it possible to have a reliable but invalid measure? With respect to the second question, [...] in our view, that is possible in abstract terms. But because we are interested in measuring concepts in the interest of evaluating causal theories, we believe that, in all practical terms, any conceivable measures that are reliable but invalid will not be useful in evaluating causal theories. Similar, it is theoretical y possible to have a valid but unreliable measures. But those measures will also be problematic for evaluating causal theories, because we will have no confidence in the hypothesis tests that we conduct. [...] only reliable and valid measures are useful for evaluating causal theories.
 

Jack Harich

Administrator
Staff member
#5
Here is the document on Solving the Problem of How to Measure the Truth. It begins as this thread begins and then goes much further. It contains a detailed walk-through of the problem and an outline of the action plan to solve the problem. (Updated on September 9 to fix typos and switch to a docx file.)
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#6
Here's a pretty good paper on Bayes and the Law, 2016, by Fenton, Neil, and Berger. It takes the evidence approach to arguments using Bayes Rule, so it may contain some good ideas for our application.

Section 2 contains an error. It says "for simplicity we assume the crime was committed on an island with 10,000 people." But Figure 1 has .999 and .001 rather than .9999 and .0001.
 
Last edited:

Jack Harich

Administrator
Staff member
#7
Our meeting is drawing near. Here's a document on Adding Bayes Rule to Structured Argument Analysis. It summarizes my work since the last document and ends with a disturbing conclusion. We're stuck for awhile. But this is normal for difficult problems. (Updated Sept 10 to fix typos and point out an error.)
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#8
I'm focusing on the question Can Bayes Rule (BR) be used for chaining rule inputs? Looking at the BR equation, it looks like it can. The probability of one calculation becomes the input for the next calculation. Each calculation is step or link in the chain. But I've found no examples of simple chaining yet. The introductory BR examples all use a single calculation.

Searching on "Bayes rule evidence", I found this article on "Bayesian Reasoning - Plausible Reasoning." Page 2 says "We would like to apply this technique recursively, as new data arrives and use this to reason over multiple hypotheses." That is chaining using BR. The document then describes how to apply this technique.

Perhaps we can learn something from this article.

There is also the question Under what conditions can BR be used for chaining rule inputs?
 

Jack Harich

Administrator
Staff member
#11
I've updated the Adding Bayes Rule document to include the Knowledge Gaps section. This was used today in a meeting with Montserrat, Scott Collison, and Jack. The key content is four specific questions to address. That's where we are focusing now. The questions are:
  1. What rules of Bayesian logic does Structured Argument Analysis use now?
  2. How can we support decision chaining, where the CL is recalculated every time the user adds a rule input? It’s beginning to look like that’s easy for independent inputs.
  3. What aspects of probability can be set for facts and then used by a form of Bayes Rule?
  4. How can we set fact CLs using Bayesian logic, in a manner that as more facts and reusable claims are entered, the fact CLs evolve to be more and more truthful? The tool should be self-learning.
 

Attachments

Jack Harich

Administrator
Staff member
#12
Rolling right along in our eternal quest to solve the problem of How to measure the truth, I've updated that document. A large section on Synthesis was added, based on a work session Jack and Montserrat had yesterday and discussion in a meeting with Scott Booher, Jack, and Montserrat last night. See pages 5 to 10. There's lots of room here for discussion and improvement.

Next comes the hard part. How exactly are we going to implement Bayes Rule?
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#13
More progress. The updated document now contains a section on Implementing Bayes Rule, written yesterday. Today, as I expected starting yesterday, I found I'd made an error in calculating a variable in Bayes Rule. I also concluded that we need a systematic approach to setting probabilities. So I added Element 4. A system for setting probabilities. This is very close to what Montserrat is focusing on: How to set Fact CLs. So I thought this was worth sharing and discussing. My next steps are to use the new probability table to set probabilities and to correct the error.

(4:10PM Sept 19 - Updated the document. It now uses the new table and the error is corrected.)
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#14
Checking the work I did yesterday, I found a large error of some kind. Using the two forms of Bayes rule to calculate a conclusion probability, I get two completely different answers. See the "Checking our work with the standard form of Bayes Rule" section in the revised document, page 19. Time to meet on this and figure out the cause of the discrepancy.
 

Attachments

#15
You've probably already solved it, but just in case:
When I run the two competing formulas on page 19, I get .001 instead of .01 for the first result, and .02 for the second (rounding up from .019) instead of .09.
 

Attachments

Jack Harich

Administrator
Staff member
#16
Wow, thanks Scott for finding an error. I've since worked on the document quite a bit.

P(conclusion|input) = .001 x .95 / .99 = .01 has been changed to:
P(conclusion|input) = .01 x .95 / .99 = .01. The next equation has been changed to:
P(conclusion|input) = .01 x .95 / (.01 x .95 + .99 x .05) = .16.

And I switched from using a calculator to a spreadsheet, which is in the document. However, to make things easier I've also attached the spreadsheet as a separate document.

But the biggest change is this problem has been solved, we hope, as described in the attached document. This is tricky logic.
 

Attachments

Jack Harich

Administrator
Staff member
#17
Montserrat and Jack reviewed the entire document on September 26, with emphasis on the New Insights and New Design sections. It was great to step through the reasoning involved in detail, so can stay in full mental synchronization on this difficult project. Here is the latest version of the document:

Added related spreadsheet on Sept 27. It can be used to test various ways to calculate the new approach, such as in rows 22 to 24.
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#18
After much more work it looks like the problem of how to implement Bayes Rule is solved, though it needs review. The document has been updated as I went along, journal style, and has grown to 33 pages. An interesting discovery is that we need weights for deductive rules but importance for inductive rules. A new spreadsheet was created to work out the math for inductive rules and test it. The latest versions of the three files involved are attached.

This leaves one more large problem to solve: How to set the confidence level of facts.
 

Attachments

Last edited:

Jack Harich

Administrator
Staff member
#19
Related to the problem of how to set the truth confidence level of facts, here's an article on The Fix for Fake News Isn't Code. It's Human. Basically the author proposes a reliability indicator on all Facebook posts that refer to news sources. Here is the key text:
Last year, in a paper published in the Kennedy Institute of Ethics Journal, I proposed a somewhat different system. The key difference between my system and the one that Facebook has implemented is transparency: Facebook should track and display how often each user decides to share disputed information after being warned that the information might be false or misleading.

Instead of using this data to calculate a secret score, Facebook should display a simple reliability marker on every post and comment. Imagine a little colored dot next to the user’s name, similar to the blue verification badges Facebook and Twitter give to trusted accounts: a green dot could indicate that the user hasn’t chosen to share much disputed news, a yellow dot could indicate that they do it sometimes, and a red dot could indicate that they do it often. These reliability markers would allow anyone to see at a glance how reliable their friends are.
But the deeper information in the article is that Facebook has an internal rating for fact reliability, which they get from fact-checkers. How does that system work?

Facebook asks independent fact-checking organizations from across the political spectrum to identify false and misleading information.
 

Jack Harich

Administrator
Staff member
#20
Yesterday I discovered that the how to calculate inductive rules problem is not solved. It's normal to encounter surprises like this in difficult cutting edge projects like ours.

To document and thwink through the problem, I've added two sections to the document near the end. The first is Surprise! The problem of how to calculate inductive rules is not solved. The second is: Design the test first, but this is barely started since I and others first want to review the present implementation of Bayes Rule and look for ways to use Bayes Rule instead of creating our own probability calculation.

Today we had three people looking this new problem over: Montserrat, Jack, and Josh Akeman. We will work on the problem independently and then meet on Thursday to share our results and try to move forward on solving the problem. Attached is the latest version of the document. The key spreadsheet it refers to is ImportanceDecisionChaining.
 

Attachments