Explainable AI (XAI), often overlapping with interpretable AI, or explainable machine learning (XML), is a field of research within artificial intelligence (AI) that explores methods that provide humans with the ability of intellectual oversight over AI algorithms.[1][2] The main focus is on the reasoning behind the decisions or predictions made by the AI algorithms,[3] to make them more understandable and transparent.[4] This addresses users' requirement to assess safety and scrutinize the automated decision making in applications.[5] XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.[6][7]
XAI hopes to help users of AI-powered systems perform more effectively by improving their understanding of how those systems reason.[8] XAI may be an implementation of the social right to explanation.[9] Even if there is no such legal right or regulatory requirement, XAI can improve the user experience of a product or service by helping end users trust that the AI is making good decisions.[10] XAI aims to explain what has been done, what is being done, and what will be done next, and to unveil which information these actions are based on.[11] This makes it possible to confirm existing knowledge, challenge existing knowledge, and generate new assumptions.[12]
Machine learning (ML) algorithms used in AI can be categorized as white-box or black-box.[13] White-box models provide results that are understandable to experts in the domain. Black-box models, on the other hand, are extremely hard to explain and may not be understood even by domain experts.[14] XAI algorithms follow the three principles of transparency, interpretability, and explainability. A model is transparent "if the processes that extract model parameters from training data and generate labels from testing data can be described and motivated by the approach designer."[15] Interpretability describes the possibility of comprehending the ML model and presenting the underlying basis for decision-making in a way that is understandable to humans.[16][17][18] Explainability is a concept that is recognized as important, but a consensus definition is not yet available;[15] one possibility is "the collection of features of the interpretable domain that have contributed, for a given example, to producing a decision (e.g., classification or regression)".[19]
In summary, Interpretability refers to the user's ability to understand model outputs, while Model Transparency includes Simulatability (reproducibility of predictions), Decomposability (intuitive explanations for parameters), and Algorithmic Transparency (explaining how algorithms work). Model Functionality focuses on textual descriptions, visualization, and local explanations, which clarify specific outputs or instances rather than entire models. All these concepts aim to enhance the comprehensibility and usability of AI systems.[20]
If algorithms fulfill these principles, they provide a basis for justifying decisions, tracking them and thereby verifying them, improving the algorithms, and exploring new facts.[21]
Sometimes it is also possible to achieve a high-accuracy result with white-box ML algorithms. These algorithms have an interpretable structure that can be used to explain predictions.[22] Concept Bottleneck Models, which use concept-level abstractions to explain model reasoning, are examples of this and can be applied in both image[23] and text[24] prediction tasks. This is especially important in domains like medicine, defense, finance, and law, where it is crucial to understand decisions and build trust in the algorithms.[11] Many researchers argue that, at least for supervised machine learning, the way forward is symbolic regression, where the algorithm searches the space of mathematical expressions to find the model that best fits a given dataset.[25][26][27]
AI systems optimize behavior to satisfy a mathematically specified goal system chosen by the system designers, such as the command "maximize the accuracy of assessing how positive film reviews are in the test dataset." The AI may learn useful general rules from the test set, such as "reviews containing the word "horrible" are likely to be negative." However, it may also learn inappropriate rules, such as "reviews containing 'Daniel Day-Lewis' are usually positive"; such rules may be undesirable if they are likely to fail to generalize outside the training set, or if people consider the rule to be "cheating" or "unfair." A human can audit rules in an XAI to get an idea of how likely the system is to generalize to future real-world data outside the test set.[28]
^Edwards, Lilian; Veale, Michael (2017). "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For". Duke Law and Technology Review. 16: 18. SSRN2972855.
^Koh, P. W.; Nguyen, T.; Tang, Y. S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. (November 2020). "Concept bottleneck models". International Conference on Machine Learning. PMLR. pp. 5338–5348.
^Ludan, J. M.; Lyu, Q.; Yang, Y.; Dugan, L.; Yatskar, M.; Callison-Burch, C. (2023). "Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck". arXiv:2310.19660 [cs.CL].
^Christiansen, Michael; Wilstrup, Casper; Hedley, Paula L. (2022). "Explainable "white-box" machine learning is the way forward in preeclampsia screening". American Journal of Obstetrics and Gynecology. 227 (5). Elsevier BV: 791. doi:10.1016/j.ajog.2022.06.057. ISSN0002-9378. PMID35779588. S2CID250160871.
^Wilstup, Casper; Cave, Chris (2021-01-15), Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths, Cold Spring Harbor Laboratory, doi:10.1101/2021.01.15.21249874, S2CID231609904