3 October 2023

Superconfidence

When the history of AI is told, it's a fair bet that there will be some post-hoc head scratching on who originated the bright idea of rebranding lying as mere hallucination. As if AI was on some sort of acid trip, and its human users along for the ride.

Perhaps it's the reticence to attribute agency to AI; hallucination is something passive, something that just happens, a natural result of ingesting (illicit?) substances. Lying suggests intent, an active choice to deceive for profit.

In any event, all the hand wringing about AI hallucination-lying had tended to frame the core problem as an issue with being given wrong or mistaken or confabulated answers.

Less attention has been paid to the perhaps more debilitating problem for human users: wrong answers are given with confidence.

1/ Transmission effects

It’s intuitive to believe that better knowledge leads to better decision making. What experimental research shows, however, is a much more convoluted picture of the link between knowledge and decision making, mediated by the confidence with which information is transmitted.

For example, various studies show that the more confidence a person exudes the more competent he or she may be perceived as being and more persuasive they are likely to be.[1] But it also seems that this effect happens independent of knowledge. In other words, saying something with confidence is as important to shaping behavior as the actual truth of what's been said.[2] Not exactly breaking news for anybody following GOP politics but still.

We also know that providing new or more information increases confidence, regardless of whether that information is actually useful in increasing knowledge.[3] In a context where acting on unfounded knowledge can have serious ramifications - citing non-existent case law in court, for example - the supreme confidence baked in to AI answers and transmitted to users is a major worry.

In a sense these transmission effects aren’t particularly surprising. AI’s human users are psychologically primed to respond positively to overconfidence, because we seem hard-wired to hate uncertainty.[4] Follow this thought through and it means we’re geared to gravitate towards knowledge sources, media commentators, politicians and - perhaps - AI agents that confidently assert “the truth”, justified or not.[5]

2/ Compound effects

These transmission effects pluck at a second string of the overconfidence problem. Overconfidence is already an issue for human decision making, and AI is making it worse. As Anders Sandberg has put it “when people are overconfident they make more stupid decisions, ignore countervailing evidence and set up policies that increase risk”.[6] This, for Sandberg, helps explain why it’s so difficult to mobilize decision makers to act on long term and existential risks; they have an over-inflated and under-evidenced confidence in their own ideas about risk.

For organizations and societies that are already dealing with fallout from human overconfidence it seems inevitable that introducing superconfident AI machines into the mix will compound or deepen the challenge of compensating for cognitive bias. Finance in particular has attempted to attenuate irrational confidence - viz. building cultures where traders and analysts are more aware of behavioral economic and decision theory lessons about framing biases. Unless the organizational roll out of overconfident AI comes with a health warning - “may exacerbate poor judgment” - we’ll see more risk percolate through these knowledge intensive systems.[7]

3/ Misalignment indicator

A third string of the superconfidence problem has been teased out by AI alignment researchers. The claim is that uncertainty is a virtue when it comes to ensuring AI agents make decisions aligned with humanity. Confidence - an excess of certainty - is an altogether damning attribute for an AI to possess.

The short version of this argument is that we humans have spent the last 2500 years or so philosophically failing to solve the alignment problem. Questions abound, for instance, about the nature of freedom, or the relative hierarchy of values like justice and security, the circumstances where natural moral rights override civil law obligations and so on.

If you ask an AI for the proper definition of peace, it would be disappointing to get a singular answer talking about the absence of violence. Ideally you’d want at least a nod to competing traditions that stress the need to address indirect and structural forms of violence. If you’re looking for a quick answer to a question this might be annoying, but it feels like rather a large slippery slope to start eliding reality because users don’t have the patience for complexity. TL;DR.

The point, as Peter Eckersley put it (with a nod to Isaiah Berlin) is that “agents that are not completely sure of the right thing to do . . . are much more likely to tolerate the agency of others, than agents that are completely sure that they know the best way for events to unfold. This appears to be true not only of AI systems, but of human ideologies and politics, where totalitarianism has often been built on a substructure of purported moral certainty.”[8]

Eckersley goes on to suggest two rules or conjectures to guide AI value alignment: (1) “powerful agents with mathematically certain, monotonically increasing, open-ended objective functions will adopt sub-goals to disable or dis-empower other agents in all or almost all cases.” And the inverse: (2) “powerful agents with mathematically uncertain objectives will not adopt sub-goals to disable or dis-empower other agents unless those agents constitute a probable threat to a wide range of objectives.”

In other words, overconfidence is a canary in the AI value alignment coal mine. AI developers need to drop their pick axes and start encoding uncertainty into AI.

Implications

The dangers of hallucinatory AI are driving discussions around how to integrate AI into organizational workflows. There are emerging tools, like NVIDIA’s Guardrails, that attack the hallucination problem by allowing human users to better understand the source knowledge. There will be other tools or iterations of AI that go further, engineering AI with ever-increasingly accurate levels of knowledge.

What we’ve tried to pull out in here is that, irrespective of progress on hallucinated or confabulated knowledge, there are distinct questions to ask about the effects of AI “superconfidence”. Or, if you like, the manner in which AI interacts with human users. Organizations - particularly in knowledge intensive sectors like law - need to make space for discussion about how to track and manage the framing effects of AI on decision making.


Footnotes

[1] Fox and Walters 1986; Cutler et al. (1990); Price and Stone (2004); Li et al., (2020)

[2] Stone et al. (2023)

[3] Stone and Opel (2000)

[4] de Berker et al. (2016) found that when volunteers were given electric shocks, their stress levels were highest when they had no idea whether they were going to be given a shock or not — higher even than subjects who were told they definitely would get one.

[5] See especially Tetlock (2005)

[6] https://theconversation.com/from-human-extinction-to-super-intelligence-two-futurists-explain-26617

[7] On how unjustified confidence limits the quality of the resulting decisions see Griffin and Tversky (1992); Yates et al. (1996); Razmdoost et al. (2015); Moore et al. (2017); Amini et al. (2020)

[8] Eckersley (2019)

References

Griffin and Tversky, 1992; Griffin, D., and Tversky, A. (1992) The weighing of evidence and the determinants of confidence. Cogn. Psychol. 24, 411–435. doi: 10.1016/0010-0285(92)90013-R

Yates et al., 1996; Yates, J. F. (1990) Judgment and Decision Making. Englewood Cliffs, NJ: Prentice Hall.

Eckersley, P. (2019) Impossibility and Uncertainty Theorems in AI Value Alignment or why your AGI should not have a utility function”, https://arxiv.org/abs/1901.00064

Razmdoost et al., 2015; Razmdoost, K., Dimitriu, R., and Macdonald, E. K. (2015) The effect of overconfidence and underconfidence on consumer value. Psychol. Mark. 32, 392–407. doi: 10.1002/mar.20787

Moore et al., 2017; Moore, D. A., Swift, S. A., Minster, A., Mellers, B., Ungar, L., Tetlock, P., et al. (2017) Confidence calibration in a multiyear geopolitical forecasting competition. Manag. Sci. 63, 3552–3565. doi: 10.1287/mnsc.2016.2525

Amini et al., 2020; Amini, B., Bassett, R. L. Jr., Miner Haygood, T., McEnery, K. W., and Richardson, M. L. (2020) Confidence calibration: an introduction with application to quality improvement. J. Am. Coll. Radiol. 17, 620–628. doi: 10.1016/j.jacr.2019.12.009

Fox and Walters, 1986; Fox, S. G., and Walters, H. A. (1986) The impact of general versus specific expert testimony and eyewitness confidence upon mock juror judgment. Law Hum. Behav. 10, 215–228. doi: 10.1007/BF01046211

Cutler et al., 1990; Cutler, B. L., Penrod, S. D., and Dexter, H. R. (1990). Juror sensitivity to eyewitness identification evidence. Law Hum. Behav. 14, 185–191. doi: 10.1007/BF01062972

Price, P. C., and Stone, E. R. (2004). Intuitive evaluation of likelihood judgment producers: evidence for a confidence heuristic. J. Behav. Decis. Mak. 17, 39–57. doi: 10.1002/bdm.460

Li et al., 2020 Li, N. P., Yong, J. C., Tsai, M.-H., Lai, M. H. C., Lim, A. J. Y., and Ackerman, J. M. (2020). Confidence is sexy and it can be trained: examining male social confidence in initial, opposite-sex interactions. J. Pers. 88, 1235–1251. doi: 10.1111/jopy.12568

Stone, E. R., and Opel, R. B. (2000) Training to improve calibration and discrimination: the effects of performance and environmental feedback. Organ. Behav. Hum. Decis. Process. 83, 282–309. doi: 10.1006/obhd.2000.2910

Stone ER, Parker AM, Hanks AR and Swiston RC (2023) Thinking without knowing: Psychological and behavioral consequences of unjustified confidence regarding blackjack strategy. Front. Psychol. 14:1015676. doi: 10.3389/fpsyg.2023.1015676

de Berker, A., Rutledge, R., Mathys, C. et al. (2016) Computations of uncertainty mediate acute stress responses in humans. Nat Commun 7, 10996. https://doi.org/10.1038/ncomms10996

Tetlock, P (2005), Expert Political Judgement, Princeton: Princeton University Press


KEYWORDS

Machine Behavior | Decision Theory | Cognitive Bias | Confidence | AI Value Alignment | Hallucination