Deceptive AI Explanations: Creation and Detection

Schneider, Johannes; Handali, Joshua; Vlachos, Michalis; Meske, Christian

Javascript is disabled or not supported in your browser. JavaScript must be enabled in order for you to use WIKINDX fully. Enable JavaScript through your browser options then try again, otherwise, try using a different browser.

AI Bibliography

WIKINDX Resources

Schneider, J., Handali, J., Vlachos, M., & Meske, C. (2020). Deceptive ai explanations: Creation and detection. arXiv preprint arXiv:2001.07641.

Resource type: Journal Article
BibTeX citation key: Schneider2020
View all bibliographic details

Categories: Artificial Intelligence, Cognitive Science, Computer Science, Decision Theory, Ethics, General, Military Science
Subcategories: Cyber, Deep learning, Human decisionmaking, Machine learning, Psychology of human-AI interaction
Creators: Handali, Meske, Schneider, Vlachos
Publisher:
Collection: arXiv preprint arXiv:2001.07641

Attachments

Abstract

Artificial intelligence comes with great opportunities and but also great risks. We investigate to what extent deep learning can be used to create and detect deceptive explanations that either aim to lure a human into believing a decision that is not truthful to the model or provide reasoning that is non-faithful to the decision. Our theoretical insights show some limits of deception and detection in the absence of domain knowledge. For empirical evaluation, we focus on text classification. To create deceptive explanations, we alter explanations originating from GradCAM, a state-of-art technique for creating explanations in neural networks. We evaluate the effectiveness of deceptive explanations on 200 participants. Our findings indicate that deceptive explanations can indeed fool humans. Our classifier can detect even seemingly minor attempts of deception with accuracy that exceeds 80% given sufficient domain knowledge encoded in the form of training data.

WIKINDX 6.7.0 | Total resources: 1621 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)