Report/11月15日本周科研工作进展.md

113 lines
3.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# 11月15日本周科研工作进展
## 1.对上周的设计进行修改,增加阶段训练法
```python
if stage_num == 1:
# 阶段1仅训练q/k编码器不使用MLP
pass
elif stage_num == 2:
# 阶段2冻结q/k编码器仅训练MLP层
for param in self.encoder_q.parameters():
param.requires_grad = False
for param in self.encoder_k.parameters():
param.requires_grad = False
if self.a is not None:
for param in self.a.parameters():
param.requires_grad = True
if self.b is not None:
for param in self.b.parameters():
param.requires_grad = True
elif stage_num == 3:
# 进行完整训练
for param in self.encoder_q.parameters():
param.requires_grad = True
for param in self.encoder_k.parameters():
param.requires_grad = True
if self.a is not None:
for param in self.a.parameters():
param.requires_grad = True
if self.b is not None:
for param in self.b.parameters():
param.requires_grad = True
elif stage_num == 4:
# 阶段4解冻MLP层冻结q/k编码器仅训练MLP层
for param in self.encoder_q.parameters():
param.requires_grad = False
for param in self.encoder_k.parameters():
param.requires_grad = False
if self.a is not None:
for param in self.a.parameters():
param.requires_grad = True
if self.b is not None:
for param in self.b.parameters():
param.requires_grad = True
elif stage_num == 5:
# 阶段5解冻所有参数进行全面训练
for param in self.encoder_q.parameters():
param.requires_grad = True
for param in self.encoder_k.parameters():
param.requires_grad = True
if self.a is not None:
for param in self.a.parameters():
param.requires_grad = True
if self.b is not None:
for param in self.b.parameters():
param.requires_grad = True
```
#### 实验:
实验中stage设置如下
```python
parser.add_argument('--stage1_epoch_num', default=20, type=int)
parser.add_argument('--stage2_epoch_num', default=5, type=int)
parser.add_argument('--stage3_epoch_num', default=50, type=int)
parser.add_argument('--stage4_epoch_num', default=5, type=int)
parser.add_argument('--stage5_epoch_num', default=70, type=int)
```
#### 效果:
目前未训练完整只训练了53epoch效果如下
![log5](./pic/log5.png)
由于增加了batchsize显得训练速度相较于epoch较慢。但Loss在ProtoNCE阶段下降缓慢考虑可能为学习率较低和梯度更新方式导致。Acc_inst正常更新Acc_Proto更新较慢怀疑以下问题
1. 目前采用的最后一层结构为两层全连接加1层relu考虑改变结构
2. stage2更新MLP层时未更新到位使得最后一层拖了后腿考虑增加学习率多epoch训练
3. 设计的最小聚类数过低或者聚类翻倍数1.7倍)不够正确
## 2.改进聚类
### 1.对最大聚类数和最小聚类数进行改进
#### 理由:
在聚类中,我们不希望聚类中样本数量有太大偏差,以确保聚类均匀稳定,但又不能设定一个过小的范围导致聚类无法收敛。
#### 修改:
以样本数/聚类数作为均值,将聚类中簇样本数视为正态分布,[0-均值]为-3标准差范围将[-1标准差1标准差]的68.3%概率的空间作为聚类数的限定空间,以保证输出的平稳,同时保证了一定冗余,以保证在最大迭代次数内能获得聚类结果。
# 下周任务
#### 1.优化投影头结构与训练流程,提升训练效果
##### 目前思路
1.考虑将Relu层放到两个全连接层中间以保证输出结构与原结构类似。
2.增加不同stage的学习率策略
3.设定更大的聚类数/设置多层聚类进行实验