需要のないページ

プログラミングや趣味や。

TensorFlow/models/Mask R-CNNのクラス構造と関数の引数

Pull Request #1561によるコードとコメントをまとめたもの。

Apache Licenseなので、問題はない... はず。

 

MaskRCNNBoxPredictor (BoxPredictor)
Mask R-CNN Box Predictor. See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. arXiv preprint arXiv:1703.06870.
This is used for the second stage of the Mask R-CNN detector where proposals cropped from an image are arranged along the batch dimension of the input image_features tensor. Notice that locations are *not* shared across classes, thus for each anchor, a separate prediction is made for each class. In addition to predicting boxes and classes, optionally this class allows predicting masks and/or keypoints inside detection boxes. Currently this box predictor makes per-class predictions; that is, each anchor makes a separate box prediction for each class.

  • __init__
    • self
    • is_training
      Indicates whether the BoxPredictor is in training mode.
    • num_classes
      Number of classes. Note that num_classes *does not* include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,... K}).
    • fc_hyperparams
      Slim arg_scope with hyperparameters for fully connected ops.
    • use_dropout
      Option to use dropout or not. Note that a single dropout op is applied here prior to both box and class predictions, which stands in contrast to the ConvolutionalBoxPredictor below.
    • dropout_keep_prob
      Keep probability for dropout. This is only used if use_dropout is True.
    • box_code_size
      Size of encoding for each box.
    • conv_hyperparams=None
      Slim arg_scope with hyperparameters for convolution ops.
    • predict_instance_masks=False
      Whether to predict object masks inside detection boxes.
    • mask_prediction_conv_depth=256
      コメントなし
    • predict_keypoints=False
      Whether to predict keypoints inside detection boxes.
  • num_classes (@property)
    • self
  • _predict
    Computes encoded object locations and corresponding confidences. Flattens image_features and applies fully connected ops (with no non-linearity) to predict box encodings and class predictions. In this setting, anchors are not spatially arranged in any way and are assumed to have been folded into the batch dimension. Thus we output 1 for the anchors dimension.
    • self
    • image_features
      A float tensor of shape [batch_size, height, width, channels] containing features for a batch of images.
    • num_predictions_per_location
      An integer representing the number of box predictions to be made per spatial location in the feature map. Currently, this must be set to 1, or an error will be raised.
    • Return Values
      A dictionary containing the following tensors.
      • box_encodings
        [batch_size, 1, num_classes, code_size] representing the location of the objects.
      • class_predications_with_background
        [batch_size, 1, num_classes + 1] representing the class predictions for the proposals.
      • instance_masks (When predict_mask is True)
        A float tensor of shape [batch_size, 1, num_classes, image_height, image_width]
      • keypoints (When predict_keypoints is True)
        [batch_size, 1, num_keypoints, 2]