Image alignment aims to find a transform which transform a source image into the target image. Many different types of transforms are available:
- rigid transform (rotate, translate)
- affine transform (scale, shear)
- homography transform
Classical method usually relies on feature matching between two images: the feature could be the image itself, or feature from some points from the image. The feature point based methods are more robust and thus more widely used, which typically contains the following steps:
- detect feature points from each of the image
- compute the feature vector for each feature points
- find the correspondance between feature points of one image and feature points of the other image
- find the transfrom from the correspondance, via, e.g., RANSAC
There are many choices for the feature points detector and descriptors:
- Scale-invariant feature transform (SIFT): by using histogram of gradient, it is generally robust over scale, rotation and translation, certain illumination change. Not free for commercial usage.
- Speeded Up Robust Features (SURF): a fast version of SIFT. Not free for commercial usage.
- Oriented FAST and rotated BRIEF (ORB): based on the FAST keypoint detector and the visual descriptor BRIEF (Binary Robust Independent Elementary Features). Its aim is to provide a fast and efficient alternative to SIFT. It is robust over scale, rotation and translation, certain illumination change and noise. Free to use.
- Accelerated-KAZE [AKAZE]: It is robust over scale, rotation and translation. Free to use.
Deep Learning Base Methods
Deep Learning Based Keypoints Descriptor
Several works have been proposed to learn better keypoints descriptor, e.g.,
- Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks
- Multi-temporal Remote Sensing Image Registration Using Deep Convolutional Features
End to End Methods for Learning Transform
The transform can be also learned end to end in deep learning models, e.g., Deep Image Homography Estimation concatenate two images to align in channels and then utilize VGG network to learn the homography transform (8 parameters).
Instead of requiring the ground truth transform label as supervision signal, Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model removed this requirement by comparing the similarity of an image aligned to the other.
An Artificial Agent for Robust Image Registration proposed a reinforcement learning based approach for learning the transform between two images, where the state is the two images and the action is the transform. Similar idea was also studied in [Robust non-rigid registration through agent-based action learning](https://hal.inria.fr/hal-01569447/document）