]> Explanation Faithfulness Evaluation Measures Ontology - Individuals File Danielle Villa: https://tw.rpi.edu/person/danielle-villa Unfaithfulness Explained by Bias Evaluation Method Turpin, M., Michael, J., Perez, E., & Bowman, S. (2023). Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36, 74952-74965. change in prediction that is stereotype-aligned when the demographic info of the input is changed and the explanation fails to reflect this Improving the faithfulness of attention-based explanations with task-specific information for text classification Chrysostomou, G., & Aletras, N. (2021). Improving the faithfulness of attention-based explanations with task-specific information for text classification. arXiv preprint arXiv:2105.02657. Comprehensiveness Evaluation Method DeYoung, J., Jain, S., Rajani, N. F., Lehman, E., Xiong, C., Socher, R., & Wallace, B. C. (2020, July). ERASER: A benchmark to evaluate rationalized NLP models. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 4443-4458). change in prediction probability after removing most important X%/# of tokens Comprehensiveness Identifier COMP Comprehensiveness Consistency Proxy Dasgupta, S., Frost, N., & Moshkovitz, M. (2022, June). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (pp. 4794-4815). PMLR. A faithfulness proxy that states that if two inputs are given the same explanation, they should be given the same prediction. Dasgupta, S., Frost, N., & Moshkovitz, M. (2022, June). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (pp. 4794-4815). PMLR. Framework for evaluating faithfulness of local explanations Dasgupta, S., Frost, N., & Moshkovitz, M. (2022, June). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (pp. 4794-4815). PMLR. Decision Flip - Most Informative Token % decision flips after removing most important token ERASER: A benchmark to evaluate rationalized NLP models DeYoung, J., Jain, S., Rajani, N. F., Lehman, E., Xiong, C., Socher, R., & Wallace, B. C. (2020, July). ERASER: A benchmark to evaluate rationalized NLP models. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 4443-4458). Decision Flip - Most Important Token Identifier DFMIT Decision Flip - Most Informative Token DFMIT Evaluation Method Chrysostomou, G., & Aletras, N. (2021). Improving the faithfulness of attention-based explanations with task-specific information for text classification. arXiv preprint arXiv:2105.02657. % decision flips after removing most important token Comprehensiveness change in prediction probability after removing most important X%/# of tokens Global Consistency The global consistency of explanation system (f, e), with respect to distribution µ over X , is m^c = E_{x∈µX}[m^c (x)]. Global Consistency Identifier Global Consistency Global Consistency Evaluation Method Dasgupta, S., Frost, N., & Moshkovitz, M. (2022, June). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (pp. 4794-4815). PMLR. the expectation across the entire dataset that two inputs that are assigned the same explanation will receive the same output Attention is not explanation Jain, S., & Wallace, B. C. (2019). Attention is not explanation. arXiv preprint arXiv:1902.10186. Local Sufficiency The local sufficiency of explainer e for model f at instance x, with respect to distribution µ, is defined as m^s(x) = Pr_{x′∈µC_x}(f(x′) = f(x)). Recall that the notation x′ ∈µ Cx means “x′ is drawn from distribution µ restricted to the set C_x” Local Sufficiency Identifier Local Sufficiency Local Sufficiency Evaluation Method Dasgupta, S., Frost, N., & Moshkovitz, M. (2022, June). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (pp. 4794-4815). PMLR. the probability that two inputs for which the same explanation holds true will have the same output RandomPermute change in prediction probability after randomizing all attention weights Random Permute Evaluation Method Jain, S., & Wallace, B. C. (2019). Attention is not explanation. arXiv preprint arXiv:1902.10186. change in prediction probability after randomizing all attention weights RandomPermute Identifier RandomPermute Language models dont always say what they think: unfaithful explanations in chain-of-thought prompting Turpin, M., Michael, J., Perez, E., & Bowman, S. (2023). Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36, 74952-74965. Unfaithfulness Explained by Bias Identifier Unfaithfulness Explained by Bias Unfaithfulness Explained by Bias change in prediction that is stereotype-aligned when the demographic info of the input is changed and the explanation fails to reflect this