Data augmentation

Data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model.[1] It is closely related to oversampling in data analysis.

Synthetic oversampling techniques for traditional machine learning

Data augmentation for image classification

Transformations of images

Geometric transformations, flipping, color modification, cropping, rotation, noise injection and random erasing are used to augment image in deep learning.[1]

Adding new synthetic images

Because image data usually have too high dimensions for traditional synthetic oversampling methods, new methods are required for creating new synthetic images for deep learning.

Generative adversarial networks enable to create new synthetic images for data augmentation.[1]

Image recognition algorithms show improvement when transferring from synthetic images generated by the Unity Game Engine.[2]


Data augmentation for Speaker Recognition

Transfer learning from synthetic speech

It has been noted that synthetic data generation of spoken MFCCs can improve the recognition of a speaker from their utterances via transfer learning from synthetic data.[3]

See also

References

  1. Shorten, Connor; Khoshgoftaar, Taghi M. (2019). "A survey on Image Data Augmentation for Deep Learning". Mathematics and Computers in Simulation. springer. 6: 60. doi:10.1186/s40537-019-0197-0.
  2. Bird, Jordan J; Faria, Diego R; Ekart, Aniko; Ayrosa, Pedro PS (2020-08-30). From simulation to reality: CNN transfer learning for scene classification. 2020 IEEE 10th International Conference on Intelligent Systems (IS). Varna, Bulgaria: IEEE. pp. 619–625.CS1 maint: date and year (link)
  3. Bird, Jordan J.; Faria, Diego R.; Premebida, Cristiano; Ekart, Aniko; Ayrosa, Pedro P. S. (2020). "Overcoming Data Scarcity in Speaker Identification: Dataset Augmentation with Synthetic MFCCs via Character-level RNN": 146–151. doi:10.1109/ICARSC49921.2020.9096166. Cite journal requires |journal= (help)
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.