Interpretable visual domain adaptation from feature representation to multi-modal semantics

Check out our new home today at Digital Collections

The classic platform is no longer being updated and will be retired permanently on July 1, 2024.

Note, collections from the Amistad Research Center will be available only via the Louisiana Digital Library.

Download document

Description

Transfer learning has revolutionized the field of deep learning, allowing the utilization of pre-trained models to address challenges such as limited training data and expensive computational resources. However, the lack of interpretability and transparency in transfer learning methods poses significant obstacles to their practical deployment and trustworthiness. This doctoral dissertation is dedicated to enhancing the transparency and interpretability of visual domain adaptation, a critical task of transfer learning, encompassing feature representation analysis and integration of multimodal semantic knowledge. By addressing the cross-domain shift and providing human-friendly explanations simultaneously, this research aims to provide deeper insights into the transfer learning process and facilitate more interpretable and trustworthy outcomes for real-world applications. We start by analyzing the distribution of learned feature representations in visual domain adaptation tasks with solely visual images available, to gain valuable insights into the transfer of knowledge across different domains. By visualizing the learned features in the domain-invariant feature space, we can observe how the boundaries between task-specific categories align in unsupervised domain adaptation tasks. These insights derived from the analysis contribute to our efforts in addressing partial domain adaptation by measuring the similarities between features and filtering out outlier categories and also support us in tackling fairness issues in imbalanced domain adaptation with limited training data through the utilization of various feature generation strategies. Moreover, we seek to utilize high-level semantic knowledge such as textual descriptions in addition to images to enhance the explanations of domain adaptation. In this regard, we introduce the Semantic-Recovery Open-Set Domain Adaptation (SR-OSDA) problem and propose a solution to recover semantic descriptions for unseen categories in the target domain while accurately identifying seen categories. By combining textual and visual data, we efficiently discover novel target classes and provide human-friendly explanations with semantic attribute prediction. Furthermore, in order to elucidate the inner workings of convolutional networks for visual feature extraction to enrich the high-level semantic explanation, we propose an interpretable driving decision-making model which employs learnable concept-based visual prototypes to identify the crucial regions and objects in ego-view images for driving actions and align the learned semantic prototypes with human annotations to enable interpretable driving decision-making. Finally, this dissertation presents an Interpretable Novel Target Discovery model that addresses the SR-OSDA problem by combining interpretation strategies and multimodal semantic knowledge. The model achieves interpretation through human-friendly multimodal semantic concept- based visual prototypes and analysis of feature representations. The research provides valuable insights for integrating AI systems across domains, promoting transparency, interpretability, and trust- worthiness in decision-making. Overall, it contributes to the development of interpretable transfer learning techniques, enhancing the understanding and practical application of deep learning models, and fostering transparent and collaborative human-AI interactions.

In collections

Tulane University Theses and Dissertations Archive

Details

Title: Interpretable visual domain adaptation from feature representation to multi-modal semantics
Author: Jing, Taotao
Advisor / Committee: Ding, Zhengming
School / Discipline: School of Science & Engineering
Computer Science
Degree: Ph.D
Date Issued: 2023
Description: Transfer learning has revolutionized the field of deep learning, allowing the utilization of pre-trained models to address challenges such as limited training data and expensive computational resources. However, the lack of interpretability and transparency in transfer learning methods poses significant obstacles to their practical deployment and trustworthiness. This doctoral dissertation is dedicated to enhancing the transparency and interpretability of visual domain adaptation, a critical task of transfer learning, encompassing feature representation analysis and integration of multimodal semantic knowledge. By addressing the cross-domain shift and providing human-friendly explanations simultaneously, this research aims to provide deeper insights into the transfer learning process and facilitate more interpretable and trustworthy outcomes for real-world applications. We start by analyzing the distribution of learned feature representations in visual domain adaptation tasks with solely visual images available, to gain valuable insights into the transfer of knowledge across different domains. By visualizing the learned features in the domain-invariant feature space, we can observe how the boundaries between task-specific categories align in unsupervised domain adaptation tasks. These insights derived from the analysis contribute to our efforts in addressing partial domain adaptation by measuring the similarities between features and filtering out outlier categories and also support us in tackling fairness issues in imbalanced domain adaptation with limited training data through the utilization of various feature generation strategies. Moreover, we seek to utilize high-level semantic knowledge such as textual descriptions in addition to images to enhance the explanations of domain adaptation. In this regard, we introduce the Semantic-Recovery Open-Set Domain Adaptation (SR-OSDA) problem and propose a solution to recover semantic descriptions for unseen categories in the target domain while accurately identifying seen categories. By combining textual and visual data, we efficiently discover novel target classes and provide human-friendly explanations with semantic attribute prediction. Furthermore, in order to elucidate the inner workings of convolutional networks for visual feature extraction to enrich the high-level semantic explanation, we propose an interpretable driving decision-making model which employs learnable concept-based visual prototypes to identify the crucial regions and objects in ego-view images for driving actions and align the learned semantic prototypes with human annotations to enable interpretable driving decision-making. Finally, this dissertation presents an Interpretable Novel Target Discovery model that addresses the SR-OSDA problem by combining interpretation strategies and multimodal semantic knowledge. The model achieves interpretation through human-friendly multimodal semantic concept- based visual prototypes and analysis of feature representations. The research provides valuable insights for integrating AI systems across domains, promoting transparency, interpretability, and trust- worthiness in decision-making. Overall, it contributes to the development of interpretable transfer learning techniques, enhancing the understanding and practical application of deep learning models, and fostering transparent and collaborative human-AI interactions.
Topical Subject(s): Interpretable AI
Publisher: Tulane University
Language: eng
Identifier (local): TUETD_Jing_Taotao_20230806.pdf
Extent: 165
Type of Resource: text
Use & Reproductions: No embargo
Contact Information: specialcollections@tulane.edu

Tulane University Digital Library

Interpretable visual domain adaptation from feature representation to multi-modal semantics

Check out our new home today at Digital Collections

The classic platform is no longer being updated and will be retired permanently on July 1, 2024.

Note, collections from the Amistad Research Center will be available only via the Louisiana Digital Library.

Description

In collections