TY - GEN
T1 - The Consensus Paradox
T2 - 38th Australasian Joint Conference on Artificial Intelligence, AI 2025
AU - Mesto, Maher
AU - Cruz, Francisco
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - In multi-teacher reinforcement learning, conventional wisdom suggests that combining expert knowledge through ensemble methods should improve performance. We reveal a striking paradox: in environments with changing goals, ensemble methods that achieve the highest agreement among teachers deliver the worst performance (32.3% success rate) – even worse than random teacher selection (34.5%). Through controlled experiments in a drifting grid world where four expert teachers guide a learning agent, we demonstrate that confidence-weighted voting creates false consensus by amplifying outdated expertise. Our analysis of 30 random seeds (F = 8957.6, p < 0.0001) shows that when environments change, teacher disagreement is not noise to be reduced but a valuable signal of adaptation. We introduce the Teacher Confusion Index (TCI) and Goal Coherence Score (GCS) to quantify this phenomenon, revealing a positive correlation (r = 0.277) between disagreement and performance. These findings challenge fundamental assumptions about ensemble learning in non-stationary environments, with implications for any multi-expert system facing concept drift.
AB - In multi-teacher reinforcement learning, conventional wisdom suggests that combining expert knowledge through ensemble methods should improve performance. We reveal a striking paradox: in environments with changing goals, ensemble methods that achieve the highest agreement among teachers deliver the worst performance (32.3% success rate) – even worse than random teacher selection (34.5%). Through controlled experiments in a drifting grid world where four expert teachers guide a learning agent, we demonstrate that confidence-weighted voting creates false consensus by amplifying outdated expertise. Our analysis of 30 random seeds (F = 8957.6, p < 0.0001) shows that when environments change, teacher disagreement is not noise to be reduced but a valuable signal of adaptation. We introduce the Teacher Confusion Index (TCI) and Goal Coherence Score (GCS) to quantify this phenomenon, revealing a positive correlation (r = 0.277) between disagreement and performance. These findings challenge fundamental assumptions about ensemble learning in non-stationary environments, with implications for any multi-expert system facing concept drift.
KW - Concept drift
KW - Consensus paradox
KW - Ensemble methods
KW - Multi-teacher learning
KW - Non-stationary environments
UR - https://www.scopus.com/pages/publications/105023829089
U2 - 10.1007/978-981-95-4972-6_33
DO - 10.1007/978-981-95-4972-6_33
M3 - Conference contribution
AN - SCOPUS:105023829089
SN - 9789819549719
T3 - Lecture Notes in Computer Science
SP - 426
EP - 438
BT - AI 2025
A2 - Liu, Miaomiao
A2 - Yu, Xin
A2 - Xu, Chang
A2 - Song, Yiliao
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 1 December 2025 through 5 December 2025
ER -