Anchor Free Object Detection[
The most sucessfull single stage object detection algorithms, e.g., YOLO, SSD, all relies all some anchor to refine to the final detection location. For those algorithms, the anchor are typically defined as the grid on the image coordinates at all possible locations, with different scale and aspect ratio.
Though much faster than their two-stage counterparts, single stage algorithms’ speed and performance is still limited by the choice of the anchor boxes: fewer than anchor leads better speed but deteroiates the accuracy. As a result, many new works are trying to design anchor free object detection algorithms.
UnitBox: An Advanced Object Detection Network
UnitBox uses Intersection over Union (IoU) loss function for bounding box prediction.
DenseBox: Unifying Landmark Localization and Object Detection
DenseBox directly compute the bounding box and its label from the feature map.
CornerNet: Detecting Objects as Paired Keypoints
In CornerNet, the bounding box is uniquely defined by its top-left corner and bottom-right corner, which is detected by each of the two branches. Corner-pooling is applied to detect the corners, which utilizes the ideas of integral image (see below)
Similar as CornerNet, it formulates the problem of finding bounding box as finding some corner points. But instead of two corners as in CornerNet, it requires four corner points and one center point, which is computed via peaks of heatmaps of each corner points.
It is based on feature pyramid network, where the final result is dynamically selected from the optimal resolution.
FCOS is anchor-box free, as well as proposal free. FCOS works by predicting a 4D vector (l, t, r, b) encoding the location of a bounding box at each foreground pixel (supervised by ground-truth bounding box information during training).
This done in a per-pixel prediction way, i.e., for each pixel, the network try to predict a bounding box from it, together with the label of class. To counter for the pixel which are far from the ground truth object (center), a centerness score is also predicted which downweights the prediction for those pixels.
If a location falls into multiple bounding boxes, it is considered as an ambiguous sample. For now, we simply choose the bounding box with minimal area as its regression target.
Feature Pyramid Network is used as the backbone.
FoveaBox: Beyond Anchor-based Object Detector
It is very similar to FCOS.
In GA-RPN, the anchor (defined as a tuple of its location and shape) is learned instead of manually defined. Then feature extraction is then adapted to this computed anchor. CenterNet: Objects as Points
CenterNet: Objects as Points
CenterNet defines the bounding box by its center. After the center is computed, its shape and pose can be further computed.
It is based on CenterNet but very similar to ExtremeNet or CornerNet, where the bounding box is now defined by a pair of corner points and the label is defined by the response of the center point.
CornerNet-Lite：CornerNet-Saccade（attention mechanism）+ CornerNet-Squeeze
[Center and Scale Prediction: A Box-free Approach for Object Detection]
As GA-RPN, the bounding box is defined by its center and shape, which is computed from two branches of the neural network.