Linguistic evidence of plagiarism in Spanish journalism
Abstract
The expertise of a forensic linguist is required in legal disputes about possible plagiarism cases. Studies in plagiarism detection have established a maximum threshold of 50% of lexical similarity in independently produced texts. This paper explores the possibility that journalistic articles require a specific similarity threshold since they share informative content (“what”, “who”, “when”, “where”, “how”, and “why”). In order to do this, 4 quantitative linguistic variables are applied to two corpora structured around 10 different topics: a corpus of study comprising 50 articles and a case corpus including 20 texts from a real case. On the basis of the former, thresholds for each variable reflecting the expectable coincidence percentages between independent texts are extracted. These thresholds are then applied to the corpus of the case to determine whether the new thresholds allow for all the plagiarism cases to be detected.Downloads
Article download
License
In order to support the global exchange of knowledge, the journal Estudios sobre el Mensaje Periodístico is allowing unrestricted access to its content as from its publication in this electronic edition, and as such it is an open-access journal. The originals published in this journal are the property of the Complutense University of Madrid and any reproduction thereof in full or in part must cite the source. All content is distributed under a Creative Commons Attribution 4.0 use and distribution licence (CC BY 4.0). This circumstance must be expressly stated in these terms where necessary. You can view the summary and the complete legal text of the licence.