Publications

Below is a list of my publications.

2025

F2HF: Feed-forward network as a noisy head filter in vision transformer explanation

I. Hossen, K. S. Alam · ICCIT 2025
Paper Project Research Page

Vision Transformers have established themselves as the state-of-the-art method for computer vision tasks involving large datasets. Yet, the underlying mechanism behind their superior performance remains insufficiently understood. Several attempts have been made to understand the decision-making process of these models. Despite promising results, these methods frequently suffer from noisy outputs or exhibit limited faithfulness in their explanations. Our analysis reveals that a key limitation of existing interpretability methods lies in their inability to capture the Vision Transformer architecture completely. Recent research has revealed that the feed-forward network (FFD) acts as a token filter. Based on this finding, we designed an interpretability method incorporating a feed-forward network as an attention filter for each attention map. We named this method F2HF (Feed-Forward Head Filtering Saliency Generator). The baseline we worked on generated a single attention map by accumulating each block's attention map and multiplying that with classifier gradient heatmap. However, the addition of Feed-Forward filtering in the encoder attention map shows as much as 16.4% improvement of the IoU. Similarly, improvement was observed in precision and F1 score. In a subjective test, three human subjects blindly selected the less noisy heatmap for 500 images, and in 84% of the cases, they chose the saliency map generated by F2HF. The experiment and the method can be reproduced in a consumer-grade GPU (NVIDIA Tesla T4).

@INPROCEEDINGS{11491078,
  author={Hossen, Imran and Alam, Kazi Saeed},
  booktitle={2025 28th International Conference on Computer and Information Technology (ICCIT)}, 
  title={F2HF: Feed-Forward Network as a Noisy Head Filter in Vision Transformer Explanations}, 
  year={2025},
  volume={},
  number={},
  pages={2675-2680},
  keywords={Satellite images;Earth Observing System;Feeds;Antennas;Radio broadcasting;Frequency modulation;Filtering;Filters;Circuits and systems;HTTP;interpretability;attention;feed-forward network},
  doi={10.1109/ICCIT68739.2025.11491078}}