History

Huey d45c5a0f6f first		2024-06-25 11:50:04 +08:00
..
configs/bua-caffe	first	2024-06-25 11:50:04 +08:00
dataloader	first	2024-06-25 11:50:04 +08:00
datasets	first	2024-06-25 11:50:04 +08:00
image	first	2024-06-25 11:50:04 +08:00
lib	first	2024-06-25 11:50:04 +08:00
models	first	2024-06-25 11:50:04 +08:00
utils	first	2024-06-25 11:50:04 +08:00
vocab	first	2024-06-25 11:50:04 +08:00
LICENSE	first	2024-06-25 11:50:04 +08:00
README.md	first	2024-06-25 11:50:04 +08:00
app.py	first	2024-06-25 11:50:04 +08:00
data.py	first	2024-06-25 11:50:04 +08:00
data2.py	first	2024-06-25 11:50:04 +08:00
eval.py	first	2024-06-25 11:50:04 +08:00
evaluation.py	first	2024-06-25 11:50:04 +08:00
extract_features.py	first	2024-06-25 11:50:04 +08:00
gpo.py	first	2024-06-25 11:50:04 +08:00
image_caption.py	first	2024-06-25 11:50:04 +08:00
mlp.py	first	2024-06-25 11:50:04 +08:00
model.py	first	2024-06-25 11:50:04 +08:00
model2.py	first	2024-06-25 11:50:04 +08:00
model3.py	first	2024-06-25 11:50:04 +08:00
ranks.pth.tar	first	2024-06-25 11:50:04 +08:00
result.txt	first	2024-06-25 11:50:04 +08:00
setup.py	first	2024-06-25 11:50:04 +08:00
shuffle.py	first	2024-06-25 11:50:04 +08:00
shuffle_test_caps.txt	first	2024-06-25 11:50:04 +08:00
shuffle_train_caps.txt	first	2024-06-25 11:50:04 +08:00
test.py	first	2024-06-25 11:50:04 +08:00
test_caps.txt	first	2024-06-25 11:50:04 +08:00
test_ids.txt	first	2024-06-25 11:50:04 +08:00
test_one.py	first	2024-06-25 11:50:04 +08:00
train.py	first	2024-06-25 11:50:04 +08:00
train_caps.txt	first	2024-06-25 11:50:04 +08:00
train_ids.txt	first	2024-06-25 11:50:04 +08:00
vocab.py	first	2024-06-25 11:50:04 +08:00

README.md

Introduction

This is Bidirectional Correct Attention Network, source code of Attend, Correct and Focus: Bidirectional Correct Attention Network for Image-Text Matching (ICIP 2021) and BCAN++: Cross-Modal Retrieval with Improved Bidirectional Correct Attention Network. It is built on top of the SCAN in Pytorch.

Requirements and Installation

We recommended the following dependencies.

Python 3.7
Pytorch 1.6+
Numpy
nltk

Download data

Download the dataset files. We use the image feature created by SCAN, downloaded here. All the data needed for reproducing the experiments in the paper, including image features and vocabularies, can be downloaded from:

wget https://scanproject.blob.core.windows.net/scan-data/data.zip
wget https://scanproject.blob.core.windows.net/scan-data/vocab.zip

Training

Train new BCAN models: Run train.py:

python train.py --data_path "$DATA_PATH" --data_name "$DATA_NAME" --logger_name "$LOGGER_NAME" --model_name "$MODEL_NAME"

Train new BCAN++ models: Run bcan++_train.py:

python bcan++_trian.py --data_path "$DATA_PATH" --data_name "$DATA_NAME" --logger_name "$LOGGER_NAME" --model_name "$MODEL_NAME"

Argument used to train Flickr30K models and MSCOCO models are similar with those of SCAN:

For Flickr30K:

Method	Arguments
BCAN-equal	`--num_epochs=20 --lr_update=15 --correct_type=equal`
BCAN-prob	`--num_epochs=20 --lr_update=15 --correct_type=prob`

For MSCOCO:

Method	Arguments
BCAN-equal	`--num_epochs=15 --lr_update=8 --correct_type=equal`
BCAN-prob	`--num_epochs=15 --lr_update=8 --correct_type=prob`

Evaluation

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$RUN_PATH/coco_scan/model_best.pth.tar", data_path="$DATA_PATH", split="test")