上传文件至 WZM

main
rzzn 2024-06-24 18:15:10 +08:00
commit 0386b973c6
5 changed files with 791 additions and 0 deletions

166
WZM/README.md Normal file
View File

@ -0,0 +1,166 @@
## The offical PyTorch code for paper ["Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information", TGRS 2022.](https://doi.org/10.1109/TGRS.2022.3163706)
# GAC
##### Author: Zhiqiang Yuan
<a href="https://github.com/xiaoyuan1996/retrievalSystem"><img src="https://travis-ci.org/Cadene/block.bootstrap.pytorch.svg?branch=master"/></a>
![Supported Python versions](https://img.shields.io/badge/python-3.7-blue.svg)
![Supported OS](https://img.shields.io/badge/Supported%20OS-Linux-yellow.svg)
![npm License](https://img.shields.io/npm/l/mithril.svg)
<a href="https://pypi.org/project/mitype/"><img src="https://img.shields.io/pypi/v/mitype.svg"></a>
### -------------------------------------------------------------------------------------
### Welcome :+1:_<big>`Fork and Star`</big>_:+1:, then we'll let you know when we update
```bash
#### News:
#### 2021.9.26: ---->Under update ...<----
```
### -------------------------------------------------------------------------------------
## INTRODUCTION
This is GAC, a cross-modal retrieval method for remote sensing images.
We use the MIDF module to fuse multi-level RS image features, and add the DREA mechanism to improve the performance of local features.
In addition, a multivariate rerank algorithm is designed to make full use of the information in the similarity matrix during the testing.
Our method has achieved the state-of-the-art performance (2021.10) in RS cross-modal retrieval task on multiple RS image-text datasets.
### Network Architecture
![arch image](./figure/GAC.jpg)
The proposed RSCTIR framework based on global and local information. Compared with the retrieval models constructed using only global features, GAC incorporates optimized local features in the visual encoding considering the target redundancy of RS. The multi-level information dynamic fusion module is designed to fuse the two types of information, using the global information to supplement the local information and utilizing the latter to correct the former. The suggested multivariate rerank algorithm as a post-processing method further improves the retrieval accuracy without extra training.
### DREA
To alleviate the pressure on the model from redundant target relations and increase the models focus on salient instances, we come up with a denoised representation matrix and a enhanced adjacency matrix to assist the GCN in producing better local representations.
DREA filters the redundant features with high similarity and enhances the features of salient targets, which enables GAC to obtain more transcendent visual representation.
### MIDF
<img src="https://github.com/xiaoyuan1996/GAC/blob/main/figure/MIDF.jpg" width="600" alt="MIDF"/>
The proposed multi-level information dynamic fusion module. The method falls into two stages of feature retransformation and dynamic fusion. MIDF first uses SA and GA modules to retransform features, then uses global information to supplement local information and leverages the latter to correct the former. Further dynamic fusion of multi-level features is accomplished through the fabricated dynamic fusion module.
### Multivariate Rerank
<img src="https://github.com/xiaoyuan1996/GAC/blob/main/figure/similartiy.jpg" width="600" alt="similarity"/>
The proposed multivariate rerank algorithm. In order to make full use of the similarity matrix, we use k candidates for reverse search and to optimize the similarity results by considering multiple ranking factors. The figure shows an illustration of multivariate rerank when k = 3, using image i for retrieval.
### Performance
![performance](./figure/performance.jpg)
Comparisons of Retrieval Performance on RSICD and RSITMD Testset.
### -------------------------------------------------------------------------------------
## IMPLEMENTATION
```bash
Installation
We recommended the following dependencies:
Python 3
PyTorch > 0.3
Numpy
h5py
nltk
yaml
```
```bash
file structure:
-- checkpoint # savepath of ckpt and logs
-- data # soorted anns of four datesets
-- rsicd_precomp
-- train_caps.txt # train anns
-- train_filename.txt # corresponding imgs
-- test_caps.txt # test anns
-- test_filename.txt # corresponding imgs
-- images # rsicd images here
-- rsitmd_precomp
...
-- exec # .sh file
-- layers # models define
-- logs # tensorboard save file
-- option # different config for different datasets and models
-- util # some script for data processing
-- vocab # vocabs for different datasets
-- seq2vec # some files about seq2vec
-- bi_skip.npz
-- bi_skip.npz.pkl
-- btable.npy
-- dictionary.txt
-- uni_skip.npz
-- uni_skip.npz.pkl
-- utable.npy
-- postprocessing # multivariate rerank
-- rerank.py
-- file
-- data.py # load data
-- engine.py # details about train and val
-- test.py # test k-fold answers
-- test_single.py # test one model
-- train.py # main file
-- utils.py # some tools
-- vocab.py # generate vocab
Note:
1. In order to facilitate reproduction, we have provided processed annotations.
2. We prepare some used file::
(1)[seq2vec (Password:NIST)](https://pan.baidu.com/s/1jz61ZYs8NZflhU_Mm4PbaQ)
(2)[RSICD images (Password:NIST)](https://pan.baidu.com/s/1lH5m047P9m2IvoZMPsoDsQ)
```
```bash
Run: (We take the dataset RSITMD as an example)
Step1:
Put the images of different datasets in ./data/{dataset}_precomp/images/
--data
--rsitmd_precomp
-- train_caps.txt # train anns
-- train_filename.txt # corresponding imgs
-- test_caps.txt # test anns
-- test_filename.txt # corresponding imgs
-- images # images here
--img1.jpg
--img2.jpg
...
Step2:
Modify the corresponding yaml in ./options.
Regard RSITMD_AMFMN.yaml as opt, which you need to change is:
opt['dataset']['data_path'] # change to precomp path
opt['dataset']['image_path'] # change to image path
opt['model']['seq2vec']['dir_st'] # some files about seq2vec
Step3:
Bash the ./sh in ./exec.
Note the GPU define in specific .sh file.
cd exec/RSICD
bash run_GAC_rsicd.sh
Note: We use k-fold verity to do a fair compare. Other details please see the code itself.
```
## Citation
If you feel this code helpful or use this code or dataset, please cite it as
```
Z. Yuan et al., "Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information," in IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2022.3163706.
```

145
WZM/data.py Normal file
View File

@ -0,0 +1,145 @@
import torch
import torch.utils.data as data
import torchvision.transforms as transforms
import os
import nltk
import numpy as np
import yaml
import argparse
import utils
from vocab import deserialize_vocab
from PIL import Image
class PrecompDataset(data.Dataset):
"""
Load precomputed captions and image features
"""
def __init__(self, data_split, vocab, opt):
self.vocab = vocab
self.loc = opt['dataset']['data_path']
self.img_path = opt['dataset']['image_path']
# Captions
self.images = []
self.captions = []
self.maxlength = 0
if data_split != 'test':
with open(self.loc+'%s_caps_verify.txt' % data_split, 'rb') as f:
for line in f:
self.captions.append(line.strip())
with open(self.loc + '%s_filename_verify.txt' % data_split, 'rb') as f:
for line in f:
self.images.append(line.strip())
else:
with open(self.loc + '%s_caps.txt' % data_split, 'rb') as f:
for line in f:
self.captions.append(line.strip())
with open(self.loc + '%s_filename.txt' % data_split, 'rb') as f:
for line in f:
self.images.append(line.strip())
self.length = len(self.captions)
# rkiros data has redundancy in images, we divide by 5, 10crop doesn't
if len(self.images) != self.length:
self.im_div = 5
else:
self.im_div = 1
if data_split == "train":
self.transform = transforms.Compose([
# transforms.Resize((278, 278)),
transforms.Resize((256, 256)),
transforms.RandomRotation(degrees=(0, 90)),
# transforms.RandomCrop(256),
transforms.RandomCrop(224),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))])
else:
self.transform = transforms.Compose([
# transforms.Resize((256, 256)),
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.485, 0.456, 0.406),
(0.229, 0.224, 0.225))])
def __getitem__(self, index):
# handle the image redundancy
img_id = index//self.im_div
caption = self.captions[index]
vocab = self.vocab
# Convert caption (string) to word ids.
tokens = nltk.tokenize.word_tokenize(
caption.lower().decode('utf-8'))
punctuations = [',', '.', ':', ';', '?', '(', ')', '[', ']', '&', '!', '*', '@', '#', '$', '%']
tokens = [k for k in tokens if k not in punctuations]
tokens_UNK = [k if k in vocab.word2idx.keys() else '<unk>' for k in tokens]
caption = []
caption.append(vocab('<start>'))
caption.extend([vocab(token) for token in tokens_UNK])
caption.append(vocab('<end>'))
target = torch.LongTensor(caption)
image = Image.open(self.img_path + str(self.images[img_id])[2:-1]).convert('RGB')
image = self.transform(image) # torch.Size([3, 256, 256])
return image, target, index, img_id
def __len__(self):
return self.length
def collate_fn(data):
# Sort a data list by caption length
data.sort(key=lambda x: len(x[1]), reverse=True)
images, captions, ids, img_ids = zip(*data)
# Merge images (convert tuple of 3D tensor to 4D tensor)
images = torch.stack(images, 0)
# Merget captions (convert tuple of 1D tensor to 2D tensor)
lengths = [len(cap) for cap in captions]
targets = torch.zeros(len(captions), max(lengths)).long()
for i, cap in enumerate(captions):
end = lengths[i]
targets[i, :end] = cap[:end]
lengths = [l if l !=0 else 1 for l in lengths]
return images, targets, lengths, ids
def get_precomp_loader(data_split, vocab, batch_size=100,
shuffle=True, num_workers=0, opt={}):
"""Returns torch.utils.data.DataLoader for custom coco dataset."""
dset = PrecompDataset(data_split, vocab, opt)
data_loader = torch.utils.data.DataLoader(dataset=dset,
batch_size=batch_size,
shuffle=shuffle,
pin_memory=True,
collate_fn=collate_fn,
num_workers=num_workers)
return data_loader
def get_loaders(vocab, opt):
train_loader = get_precomp_loader( 'train', vocab,
opt['dataset']['batch_size'], True, opt['dataset']['workers'], opt=opt)
val_loader = get_precomp_loader( 'val', vocab,
opt['dataset']['batch_size_val'], False, opt['dataset']['workers'], opt=opt)
return train_loader, val_loader
def get_test_loader(vocab, opt):
test_loader = get_precomp_loader( 'test', vocab,
opt['dataset']['batch_size_val'], False, opt['dataset']['workers'], opt=opt)
return test_loader

184
WZM/engine.py Normal file
View File

@ -0,0 +1,184 @@
import time
import torch
import numpy as np
import sys
from torch.autograd import Variable
import tensorboard_logger as tb_logger
import logging
from torch.nn.utils.clip_grad import clip_grad_norm
from model.utils import cosine_sim, cosine_similarity
import utils
def train(train_loader, model, optimizer, epoch, opt={}):
# extract value
grad_clip = opt['optim']['grad_clip']
max_violation = opt['optim']['max_violation']
margin = opt['optim']['margin']
loss_name = opt['model']['name'] + "_" + opt['dataset']['datatype']
print_freq = opt['logs']['print_freq']
# switch to train mode
model.train()
batch_time = utils.AverageMeter()
data_time = utils.AverageMeter()
train_logger = utils.LogCollector()
end = time.time()
params = list(model.parameters())
for i, train_data in enumerate(train_loader):
images, captions, lengths, ids= train_data
batch_size = images.size(0)
# print("batch_size : ", batch_size)
margin = float(margin)
# measure data loading time
data_time.update(time.time() - end)
model.logger = train_logger
input_visual = Variable(images)
input_text = Variable(captions)
if torch.cuda.is_available():
input_visual = input_visual.cuda()
input_text = input_text.cuda()
# visual_feature, text_feature = model(input_visual, input_local_rep, input_local_adj, input_text, lengths)
# scores = cosine_sim(visual_feature, text_feature)
# print("visual_feature shape : ", visual_feature.shape)
scores = model(input_visual, input_text, lengths)
# print("scores shape : ", scores.shape)
torch.cuda.synchronize()
loss = utils.calcul_loss(scores, input_visual.size(0), margin, max_violation=max_violation, )
if grad_clip > 0:
clip_grad_norm(params, grad_clip)
train_logger.update('L', loss.cpu().data.numpy())
optimizer.zero_grad()
loss.backward()
torch.cuda.synchronize()
optimizer.step()
torch.cuda.synchronize()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % print_freq == 0:
logging.info(
'Epoch: [{0}][{1}/{2}]\t'
'Time {batch_time.val:.3f}\t'
'{elog}\t'
.format(epoch, i, len(train_loader),
batch_time=batch_time,
elog=str(train_logger)))
utils.log_to_txt(
'Epoch: [{0}][{1}/{2}]\t'
'Time {batch_time.val:.3f}\t'
'{elog}\t'
.format(epoch, i, len(train_loader),
batch_time=batch_time,
elog=str(train_logger)),
opt['logs']['ckpt_save_path']+ opt['model']['name'] + "_" + opt['dataset']['datatype'] +".txt"
)
tb_logger.log_value('epoch', epoch)
tb_logger.log_value('step', i)
tb_logger.log_value('batch_time', batch_time.val)
train_logger.tb_log(tb_logger)
def validate(val_loader, model):
model.eval()
val_logger = utils.LogCollector()
model.logger = val_logger
start = time.time()
# input_visual = np.zeros((len(val_loader.dataset), 3, 256, 256))
input_visual = np.zeros((len(val_loader.dataset), 3, 224, 224))
input_text = np.zeros((len(val_loader.dataset), 47), dtype=np.int64)
input_text_lengeth = [0]*len(val_loader.dataset)
for i, val_data in enumerate(val_loader):
images, captions, lengths, ids = val_data
for (id, img, cap, l) in zip(ids, (images.numpy().copy()), (captions.numpy().copy()), lengths):
input_visual[id] = img
input_text[id, :captions.size(1)] = cap
input_text_lengeth[id] = l
input_visual = np.array([input_visual[i] for i in range(0, len(input_visual), 5)])
d = utils.shard_dis_GAC(input_visual, input_text, model, lengths=input_text_lengeth)
end = time.time()
print("calculate similarity time:", end - start)
(r1i, r5i, r10i, medri, meanri), _ = utils.acc_i2t2(d)
logging.info("Image to text: %.1f, %.1f, %.1f, %.1f, %.1f" %
(r1i, r5i, r10i, medri, meanri))
(r1t, r5t, r10t, medrt, meanrt), _ = utils.acc_t2i2(d)
logging.info("Text to image: %.1f, %.1f, %.1f, %.1f, %.1f" %
(r1t, r5t, r10t, medrt, meanrt))
currscore = (r1t + r5t + r10t + r1i + r5i + r10i)/6.0
all_score = "r1i:{} r5i:{} r10i:{} medri:{} meanri:{}\n r1t:{} r5t:{} r10t:{} medrt:{} meanrt:{}\n sum:{}\n ------\n".format(
r1i, r5i, r10i, medri, meanri, r1t, r5t, r10t, medrt, meanrt, currscore
)
tb_logger.log_value('r1i', r1i)
tb_logger.log_value('r5i', r5i)
tb_logger.log_value('r10i', r10i)
tb_logger.log_value('medri', medri)
tb_logger.log_value('meanri', meanri)
tb_logger.log_value('r1t', r1t)
tb_logger.log_value('r5t', r5t)
tb_logger.log_value('r10t', r10t)
tb_logger.log_value('medrt', medrt)
tb_logger.log_value('meanrt', meanrt)
tb_logger.log_value('rsum', currscore)
return currscore, all_score
def validate_test(val_loader, model):
model.eval()
val_logger = utils.LogCollector()
model.logger = val_logger
start = time.time()
# input_visual = np.zeros((len(val_loader.dataset), 3, 256, 256))
input_visual = np.zeros((len(val_loader.dataset), 3, 224, 224))
input_text = np.zeros((len(val_loader.dataset), 47), dtype=np.int64)
input_text_lengeth = [0] * len(val_loader.dataset)
embed_start = time.time()
for i, val_data in enumerate(val_loader):
images, captions, lengths, ids = val_data
for (id, img, cap, l) in zip(ids, (images.numpy().copy()), (captions.numpy().copy()), lengths):
input_visual[id] = img
input_text[id, :captions.size(1)] = cap
input_text_lengeth[id] = l
input_visual = np.array([input_visual[i] for i in range(0, len(input_visual), 5)])
embed_end = time.time()
print("embedding time: {}".format(embed_end-embed_start))
d = utils.shard_dis_GAC(input_visual, input_text, model, lengths=input_text_lengeth)
end = time.time()
print("calculate similarity time:", end - start)
return d

112
WZM/find_best_pth.py Normal file
View File

@ -0,0 +1,112 @@
import os, random, copy
import numpy as np
import torch
import argparse
import yaml
import logging
import utils
import data
import engine
from vocab import deserialize_vocab
import mytools
from model import GAC as models
def parser_options():
# Hyper Parameters setting
parser = argparse.ArgumentParser()
parser.add_argument('--path_opt', default='option/RSITMD_mca/RSITMD_GAC.yaml', type=str,
help='path to a yaml options file')
opt = parser.parse_args()
# load model options
with open(opt.path_opt, 'r') as handle:
options = yaml.safe_load(handle)
return options
def main(options, vocab):
# Create dataset, model, criterion and optimizer
test_loader = data.get_test_loader(vocab, options)
model = models.factory(options['model'],
vocab,
cuda=True,
data_parallel=False)
print('Model has {} parameters'.format(utils.params_count(model)))
# optionally resume from a checkpoint
if os.path.isfile(options['optim']['resume']):
print("=> loading checkpoint '{}'".format(options['optim']['resume']))
checkpoint = torch.load(options['optim']['resume'])
start_epoch = checkpoint['epoch']
best_rsum = checkpoint['best_rsum']
model.load_state_dict(checkpoint['model'])
else:
print("=> no checkpoint found at '{}'".format(options['optim']['resume']))
# evaluate on test set
sims = engine.validate_test(test_loader, model)
# get indicators
(r1i, r5i, r10i, medri, meanri), _ = utils.acc_i2t2(sims)
logging.info("Image to text: %.1f, %.1f, %.1f, %.1f, %.1f" %
(r1i, r5i, r10i, medri, meanri))
(r1t, r5t, r10t, medrt, meanrt), _ = utils.acc_t2i2(sims)
logging.info("Text to image: %.1f, %.1f, %.1f, %.1f, %.1f" %
(r1t, r5t, r10t, medrt, meanrt))
currscore = (r1t + r5t + r10t + r1i + r5i + r10i)/6.0
all_score = "r1i:{} r5i:{} r10i:{} medri:{} meanri:{}\n r1t:{} r5t:{} r10t:{} medrt:{} meanrt:{}\n sum:{}\n ------\n".format(
r1i, r5i, r10i, medri, meanri, r1t, r5t, r10t, medrt, meanrt, currscore
)
# 记录到输出文件中
outputfile_path = "RSICD_GAC_decay0.5_m0.2_without_m4m5.txt"
with open(outputfile_path, 'a') as file:
file.writelines(options['optim']['resume'])
file.write(all_score)
print(all_score)
return [r1i, r5i, r10i, r1t, r5t, r10t, currscore]
def get_allpth_score(options, k, vocab):
updated_options = copy.deepcopy(options)
scores = []
directory = options['logs']['ckpt_save_path'] + options['k_fold']['experiment_name'] + "/" + str(k)
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('.tar'):
file_path = os.path.join(root, file)
updated_options['optim']['resume'] = file_path
# run experiment
one_score = main(updated_options, vocab)
scores.append(one_score)
return scores
if __name__ == '__main__':
options = parser_options()
# make vocab
vocab = deserialize_vocab(options['dataset']['vocab_path'])
vocab_word = sorted(vocab.word2idx.items(), key=lambda x: x[1], reverse=False)
vocab_word = [tup[0] for tup in vocab_word]
# calc ave k results
last_score = []
for k in range(options['k_fold']['nums']):
print("=========================================")
print("Start evaluate {}th fold".format(k))
scores = get_allpth_score(options, k, vocab)
last_score.extend(scores)
print("Complete evaluate {}th fold".format(k))
# average
print("===================== Ave Score ({}-fold verify) =================".format(options['k_fold']['nums']))
last_score = np.average(last_score, axis=0)
names = ['r1i', 'r5i', 'r10i', 'r1t', 'r5t', 'r10t', 'mr']
for name,score in zip(names, last_score):
print("{}:{}".format(name, score))

184
WZM/mytools.py Normal file
View File

@ -0,0 +1,184 @@
# coding:utf-8
"""导入一些包"""
import os
import time, random
import json
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
""" 打印一些东西 """
"""----------------------------------------------------------------------"""
# 打印列表按照竖行的形式
def print_list(list):
print("++++++++++++++++++++++++++++++++++++++++++++")
for l in list:
print(l)
print("++++++++++++++++++++++++++++++++++++++++++++")
# 打印字典按照竖行的形式
def print_dict(dict):
print("++++++++++++++++++++++++++++++++++++++++++++")
for k, v in dict.items():
print("key:", k, " value:", v)
print("++++++++++++++++++++++++++++++++++++++++++++")
# 打印一些东西,加入标识符
def print_with_log(info):
print("++++++++++++++++++++++++++++++++++++++++++++")
print(info)
print("++++++++++++++++++++++++++++++++++++++++++++")
# 打印标识符
def print_log():
print("++++++++++++++++++++++++++++++++++++++++++++")
""" 文件存储 """
"""----------------------------------------------------------------------"""
# 保存结果到json文件
def save_to_json(info, filename, encoding='UTF-8'):
with open(filename, "w", encoding=encoding) as f:
json.dump(info, f, indent=2, separators=(',', ':'))
# 从json文件中读取
def load_from_json(filename):
with open(filename, encoding='utf-8') as f:
info = json.load(f)
return info
# 储存为npy文件
def save_to_npy(info, filename):
np.save(filename, info, allow_pickle=True)
# 从npy中读取
def load_from_npy(filename):
info = np.load(filename, allow_pickle=True)
return info
# 保存结果到txt文件
def log_to_txt(contexts=None, filename="save.txt", mark=False, encoding='UTF-8', add_n=False):
f = open(filename, "a", encoding=encoding)
if mark:
sig = "------------------------------------------------\n"
f.write(sig)
elif isinstance(contexts, dict):
tmp = ""
for c in contexts.keys():
tmp += str(c) + " | " + str(contexts[c]) + "\n"
contexts = tmp
f.write(contexts)
else:
if isinstance(contexts, list):
tmp = ""
for c in contexts:
if add_n:
tmp += str(c) + "\n"
else:
tmp += str(c)
contexts = tmp
else:
contexts = contexts + "\n"
f.write(contexts)
f.close()
# 从txt中读取行
def load_from_txt(filename, encoding="utf-8"):
f = open(filename, 'r', encoding=encoding)
contexts = f.readlines()
return contexts
""" 字典变换 """
"""----------------------------------------------------------------------"""
# 键值互换
def dict_k_v_exchange(dict):
tmp = {}
for key, value in dict.items():
tmp[value] = key
return tmp
# 2维数组转字典
def d2array_to_dict(d2array):
# Input: N x 2 list
# Output: dict
dict = {}
for item in d2array:
if item[0] not in dict.keys():
dict[item[0]] = [item[1]]
else:
dict[item[0]].append(item[1])
return dict
""" 绘图 """
"""----------------------------------------------------------------------"""
# 绘制3D图像
def visual_3d_points(list, color=True):
"""
:param list: N x (dim +1)
N 为点的数量
dim 输入数据的维度
1 为类别 即可视化的颜色 当且仅当color为True时
"""
list = np.array(list)
if color:
data = list[:, :4]
label = list[:, -1]
else:
data = list
label = None
# PCA降维
pca = PCA(n_components=3, whiten=True).fit(data)
data = pca.transform(data)
# 定义坐标轴
fig = plt.figure()
ax1 = plt.axes(projection='3d')
if label is not None:
color = label
else:
color = "blue"
ax1.scatter3D(np.transpose(data)[0], np.transpose(data)[1], np.transpose(data)[2], c=color) # 绘制散点图
plt.show()
""" 实用工具 """
"""----------------------------------------------------------------------"""
# 计算数组中元素出现的个数
def count_list(lens):
dict = {}
for key in lens:
dict[key] = dict.get(key, 0) + 1
dict = sorted(dict.items(), key=lambda x: x[1], reverse=True)
print_list(dict)
return dict
# list 加法 w1、w2为权重
def list_add(list1, list2, w1=1, w2=1):
return [l1 * w1 + l2 * w2 for (l1, l2) in zip(list1, list2)]