PaddlePaddle/PaddleDetection

关于静态图中 MultiScaleTest时遇到的问题

Open

#3 297 ouverte le 6 juin 2021

Voir sur GitHub
 (2 commentaires) (0 réactions) (1 assigné)Python (2 731 forks)batch import
help wanted

Métriques du dépôt

Stars
 (11 414 stars)
Métriques de merge PR
 (Merge moyen 2j 3h) (1 PR mergée en 30 j)

Description

我在使用Cascade RCNN的结构,在测试时发现 multi scale的效果会变得很差很差,而single scale每个scale的效果都还算可以,在debug过程中我有如下几个问题:

  1. 为什么mutli scale过程中,每个尺度调用 self.bbox_head.get_prediction 的结果个数都为1000,而single scale 中调用 self.bbox_head.get_prediction的过程中结果为NMS之后的个数。请问这是什么原因造成mutli scale中bbox_head中的nms没有生效的呢?
  2. 为何 performance会下降的很厉害,我试过将 mutli scale中每个scale 单独测试再用 soft_nms对所有结果 进行nms,发现performance 并没有变化太多,所以应该是 mutli scale 这里的细节问题。

如下是config文件:

architecture: CascadeRCNN
max_iters: 40968
snapshot_iter: 1707
use_gpu: true
log_iter: 20
save_dir: output
# pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/cascade_rcnn_dcn_r50_fpn_1x.tar
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar
weights: output/cascade_rcnn_dcn_r50_fpn_2x_gn/model_final
metric: COCO
num_classes: 4

CascadeRCNN:
  backbone: ResNet
  fpn: FPN
  rpn_head: FPNRPNHead
  roi_extractor: FPNRoIAlign
  bbox_head: CascadeBBoxHead
  bbox_assigner: CascadeBBoxAssigner

ResNet:
  norm_type: bn
  depth: 50
  feature_maps: [2, 3, 4, 5]
  freeze_at: 2
  variant: d
  dcn_v2_stages: [3, 4, 5]

FPN:
  min_level: 2
  max_level: 6
  num_chan: 256
  spatial_scale: [0.03125, 0.0625, 0.125, 0.25]
  norm_type: gn

FPNRPNHead:
  anchor_generator:
    anchor_sizes: [32, 64, 128, 256, 512]
    aspect_ratios: [0.5, 1.0, 2.0]
    stride: [16.0, 16.0]
    variance: [1.0, 1.0, 1.0, 1.0]
  anchor_start_size: 32
  min_level: 2
  max_level: 6
  num_chan: 256
  rpn_target_assign:
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_positive_overlap: 0.7
    rpn_negative_overlap: 0.3
    rpn_straddle_thresh: 0.0
  train_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 2000
    post_nms_top_n: 2000
  test_proposal:
    min_size: 0.0
    nms_thresh: 0.7
    pre_nms_top_n: 1000
    post_nms_top_n: 1000

FPNRoIAlign:
  canconical_level: 4
  canonical_size: 224
  min_level: 2
  max_level: 5
  box_resolution: 7
  sampling_ratio: 2

CascadeBBoxAssigner:
  batch_size_per_im: 512
  bbox_reg_weights: [10, 20, 30]
  bg_thresh_lo: [0.0, 0.0, 0.0]
  bg_thresh_hi: [0.5, 0.6, 0.7]
  fg_thresh: [0.5, 0.6, 0.7]
  fg_fraction: 0.25

CascadeBBoxHead:
  head: CascadeXConvNormHead
  nms: MultiClassSoftNMS

CascadeXConvNormHead:
  norm_type: gn

MultiClassSoftNMS:
  score_threshold: 0.01
  keep_top_k: 300
  softnms_sigma: 0.5

MultiScaleTEST:
  score_thresh: 0.05
  nms_thresh: 0.5
  detections_per_im: 100
  enable_voting: true
  vote_thresh: 0.9

LearningRate:
  base_lr: 0.02
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones: [27312, 37554]
  - !LinearWarmup
    start_factor: 0.001
    steps: 500

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0001
    type: L2

_READER_: 'faster_fpn_reader.yml'
TrainReader:
  batch_size: 2

EvalReader:
  batch_size: 1
  inputs_def:
    fields: ['image', 'im_info', 'im_id', 'im_shape']
    multi_scale: true
    num_scales: 2
    use_flip: true
  sample_transforms:
  - !DecodeImage
    to_rgb: true
  - !NormalizeImage
    is_channel_first: false
    is_scale: true
    mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225
  - !MultiscaleTestResize
    origin_target_size: 1280
    origin_max_size: 2000
    use_flip: true
  - !Permute
    channel_first: true
    to_bgr: false
  - !PadMultiScaleTest
    pad_to_stride: 32
  worker_num: 2

Guide contributeur