IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training

BlackBoxNLP 2024

Sep 21

Link:

https://openreview.net/pdf/c03c96f5ac017bf32c503143f8887d1e49fbdf5e.pdf

Title:

IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training

Abstract:

Attention has long served as a foundational technique for generating explanations. With the recent developments made in Explainable AI (XAI), the multi-faceted nature of interpretability has become more apparent. Can attention, as an explanation method, be adapted to meet the diverse needs that our expanded understanding of interpretability demands? In this work, we aim to address this question by introducing \texttt{IvRA}, a framework designed to directly train a language model's attention distribution through regularization to produce attribution explanations that align with interpretability criteria such as simulatability, faithfulness, and consistency. Our extensive experimental analysis demonstrates that \texttt{IvRA} outperforms existing methods in guiding language models to generate explanations that are simulatable, faithful, and consistent in tandem with their predictions. Furthermore, we perform ablation studies to verify the robustness of \texttt{IvRA} across various experimental settings and to shed light on the interactions among different interpretability criteria.

Citation:

Xie S, Vosoughi S, Hassanpour S. IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training. BlackboxNLP; 2024 Sept 21.

HL-FoundationalAI-Research

BMIRDS .

IvRA: A Framework to Enhance Attention-Based Explanations for Language Models with Interpretability-Driven Training

Link:

Title:

Abstract:

Citation:

A multi-model approach integrating whole-slide imaging and clinicopathologic features to predict breast cancer recurrence risk

MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations