Skip to main navigation Skip to search Skip to main content

The Consensus Paradox: When Low Disagreement Leads to Catastrophic Failure in Multi-teacher Reinforcement Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In multi-teacher reinforcement learning, conventional wisdom suggests that combining expert knowledge through ensemble methods should improve performance. We reveal a striking paradox: in environments with changing goals, ensemble methods that achieve the highest agreement among teachers deliver the worst performance (32.3% success rate) – even worse than random teacher selection (34.5%). Through controlled experiments in a drifting grid world where four expert teachers guide a learning agent, we demonstrate that confidence-weighted voting creates false consensus by amplifying outdated expertise. Our analysis of 30 random seeds (F = 8957.6, p < 0.0001) shows that when environments change, teacher disagreement is not noise to be reduced but a valuable signal of adaptation. We introduce the Teacher Confusion Index (TCI) and Goal Coherence Score (GCS) to quantify this phenomenon, revealing a positive correlation (r = 0.277) between disagreement and performance. These findings challenge fundamental assumptions about ensemble learning in non-stationary environments, with implications for any multi-expert system facing concept drift.

Original languageEnglish
Title of host publicationAI 2025
Subtitle of host publicationAdvances in Artificial Intelligence - 38th Australasian Joint Conference on Artificial Intelligence, AI 2025, Proceedings
EditorsMiaomiao Liu, Xin Yu, Chang Xu, Yiliao Song
PublisherSpringer Science and Business Media Deutschland GmbH
Pages426-438
Number of pages13
ISBN (Print)9789819549719
DOIs
StatePublished - 2026
Event38th Australasian Joint Conference on Artificial Intelligence, AI 2025 - Canberra, Australia
Duration: 1 Dec 20255 Dec 2025

Publication series

NameLecture Notes in Computer Science
Volume16371 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference38th Australasian Joint Conference on Artificial Intelligence, AI 2025
Country/TerritoryAustralia
CityCanberra
Period1/12/255/12/25

Keywords

  • Concept drift
  • Consensus paradox
  • Ensemble methods
  • Multi-teacher learning
  • Non-stationary environments

Fingerprint

Dive into the research topics of 'The Consensus Paradox: When Low Disagreement Leads to Catastrophic Failure in Multi-teacher Reinforcement Learning'. Together they form a unique fingerprint.

Cite this