TensorFlow/models/Mask R-CNNのクラス構造と関数の引数

Pull Request #1561によるコードとコメントをまとめたもの。
Apache Licenseなので、問題はない... はず。
 
MaskRCNNBoxPredictor (BoxPredictor)
Mask R-CNN Box Predictor. See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. arXiv preprint arXiv:1703.06870.
This is used for the second stage of the Mask R-CNN detector where proposals cropped from an image are arranged along the batch dimension of the input image_features tensor. Notice that locations are *not* shared across classes, thus for each anchor, a separate prediction is made for each class. In addition to predicting boxes and classes, optionally this class allows predicting masks and/or keypoints inside detection boxes. Currently this box predictor makes per-class predictions; that is, each anchor makes a separate box prediction for each class.
__init__

self
is_training
Indicates whether the BoxPredictor is in training mode.
num_classes
Number of classes. Note that num_classes *does not* include the background category, so if groundtruth labels take values in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the assigned classification targets can range from {0,... K}).
fc_hyperparams
Slim arg_scope with hyperparameters for fully connected ops.
use_dropout
Option to use dropout or not. Note that a single dropout op is applied here prior to both box and class predictions, which stands in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob
Keep probability for dropout. This is only used if use_dropout is True.
box_code_size
Size of encoding for each box.
conv_hyperparams=None
Slim arg_scope with hyperparameters for convolution ops.
predict_instance_masks=False
Whether to predict object masks inside detection boxes.
mask_prediction_conv_depth=256
コメントなし
predict_keypoints=False
Whether to predict keypoints inside detection boxes.

num_classes (@property)

self

_predict
Computes encoded object locations and corresponding confidences. Flattens image_features and applies fully connected ops (with no non-linearity) to predict box encodings and class predictions. In this setting, anchors are not spatially arranged in any way and are assumed to have been folded into the batch dimension. Thus we output 1 for the anchors dimension.

self
image_features
A float tensor of shape [batch_size, height, width, channels] containing features for a batch of images.
num_predictions_per_location
An integer representing the number of box predictions to be made per spatial location in the feature map. Currently, this must be set to 1, or an error will be raised.
Return Values
A dictionary containing the following tensors.
box_encodings
[batch_size, 1, num_classes, code_size] representing the location of the objects.
class_predications_with_background
[batch_size, 1, num_classes + 1] representing the class predictions for the proposals.
instance_masks (When predict_mask is True)
A float tensor of shape [batch_size, 1, num_classes, image_height, image_width]
keypoints (When predict_keypoints is True)
[batch_size, 1, num_keypoints, 2]
需要のないページ

プログラミングや趣味や。

TensorFlow/models/Mask R-CNNのクラス構造と関数の引数