Data augmentation

Data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model.[1] It is closely related to oversampling in data analysis.

Synthetic oversampling techniques for traditional machine learning

Data augmentation for image classification

Transformations of images

Geometric transformations, flipping, color modification, cropping, rotation, noise injection and random erasing are used to augment image in deep learning.[1]

Adding new synthetic images

Because image data usually have too high dimensions for traditional synthetic oversampling methods, new methods are required for creating new synthetic images for deep learning.

Generative adversarial networks enable to create new synthetic images for data augmentation.[1]

Image recognition algorithms show improvement when transferring from synthetic images generated by the Unity Game Engine.[2]