Text simplification

Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain large vocabularies and complex compound constructions that are not easily processed through automation. In terms of reducing language diversity, semantic compression can be employed to limit and simplify a set of words used in given texts.

Example

Text Simplification is illustrated with an example from Siddharthan (2006).[1] The first sentence contains two relative clauses and one conjoined verb phrase. A text simplification system aims to simplify the first sentence to the second sentence.

Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold.
Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today.

One approach to text simplification is lexical simplification via lexical substitution, a two-step process consisting of identifying complex words and replacing them with simpler synonyms. A key challenge here is identifying complex words, which is performed by a machine learning classifier trained on labelled data. An improvement over classical methods of applying binary labels to words as simple or complex is to ask labellers to sort words in order of complexity; this results in higher consistency of resultant labels.[2]

References

Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation. 4 (1): 77–109. doi:10.1007/s11168-006-9011-1. S2CID 14619244.
Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity". Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. Retrieved 22 November 2019.

Wei Xu, Chris Callison-Burch and Courtney Napoles. "Problems in Current Text Simplification Research". In Transactions of the Association for Computational Linguistics (TACL), Volume 3, 2015, Pages 283–297.
Advaith Siddharthan. "Syntactic Simplification and Text Cohesion". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.
Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation. 4 (1): 77–109. doi:10.1007/s11168-006-9011-1. S2CID 14619244.

[2] Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity". Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. Retrieved 22 November 2019.

Natural language processing
General terms	AI-complete Bag-of-words n-gram Bigram Trigram Natural language understanding Speech corpus Stopwords Text corpus
Text analysis	Collocation extraction Concept mining Compound term processing Coreference resolution Lemmatisation Named-entity recognition Ontology learning Parsing Part-of-speech tagging Semantic similarity Sentiment analysis Stemming Terminology extraction Text chunking Text segmentation Sentence segmentation Word segmentation Textual entailment Truecasing Word-sense disambiguation
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech segmentation Speech synthesis Natural language generation Optical character recognition
Topic model	Latent Dirichlet allocation Latent semantic analysis Pachinko allocation
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Chatbot Interactive fiction Question answering Virtual assistant Voice user interface

Text simplification

Example

See also

References

External links