Automatic Analysis of school textbooks’ syntactic complexity
- Romualdo Ibáñez Orellana Pontificia Universidad Católica de Valparaíso https://orcid.org/0000-0001-9298-3806
- Juan Zamora Osorio Pontificia Universidad Católica de Valparaíso https://orcid.org/0000-0003-0003-182X
- Mariela Cisnero Correa Pontificia Universidad Católica de Valparaíso https://orcid.org/0000-0002-5747-5475
- Solange Aguirre Rozas Pontificia Universidad Católica de Valparaíso https://orcid.org/0000-0003-0828-7133
Abstract
The purpose of this study was to automatically compare the syntactic complexity of texts used to communicate knowledge in the school textbooks of three school subjects (Language and Communication, Natural Science, and History, Geography and Social Science). To do so, we collected a corpus of 2121 instances of the pedagogic genre Content Exposition (Ibáñez, Moncada, Cornejo y Arriaza, 2017), present in the school textbooks that the State of Chile provides to sixth, seventh, and eighth graders attending public schools. Texts were automatically analyzed by an algorithm that identifies syntactic dependency relations in a sentence and also calculates the Syntactic Dependency Length (SDL) of that sentence. Results showed that the SDL of the analyzed texts -corresponding to different levels and school subjects- was homogeneously low. Besides, it was possible to observe that there was not a pattern of incremental complexity associated to school levels. Results also showed that while it was not possible to identify disciplinary patterns that allowed the identification of school subjects exhibiting more CS, there was a tendency that places History, Geography and Social Science as the most syntactically complex.
Downloads
References
Altamirano, P., Godoy, G., Manghi, D. & Soto, G. (2014). Analizando los textos de Historia, Geografía y Ciencias Sociales: La configuración multimodal de los pueblos originarios. Estudios Pedagógicos, 40(1), 263-280. https://doi.org/10.4067/s0718-07052014000100016
Aravena, S. & Hugo, E. (2016). Desarrollo de la complejidad sintáctica en textos narrativos y explicativos escritos por estudiantes secundarios. Lenguas Modernas (47), 9-40. https://revistas.uchile.cl/index.php/LM/article/view/45181
Arnold, J., Wasow, T., Losongco, A. & Ginstrom, R. (2000). Heaviness vs. Newness: The Effects of Structural Complexity and Discourse Status on Constituent Ordering. Language, 17(1), 28-55. https://doi.org/10.1353/lan.2000.0045
Bailey, A., Butler, F., Stevens, R. & Lord, C. (2007). Further specifying the language demands of school. En A. Bailey (Ed.), The language demands of school: Putting academic English to the test (pp. 103-156). Yale University Press.
Ballesteros, M., Bohnet, B., Mille, S., & Wanner, L. (2016). Data-driven deep-syntactic dependency parsing. Natural Language Engineering, 22(6), 939-974. https://doi.org/10.1017/S1351324915000285
Bañados, E. (2007). Integrando las tecnologías de información y comunicación en el currículum, como recurso pedagógico complementario al texto escolar en la enseñanza-aprendizaje de idiomas extranjeros. En Primer Seminario Internacional de textos escolares SITE 2006. Disponible en https://bibliotecadigital.mineduc.cl
Bartek, B., Lewis, R., Vasishth, S. & Smith, M. (2011). In Search of On-Line Locality Effects in Sentence Comprehension. Journal of Experimental Psychology: Learning, Memory and Cognition, 37(5), 1178-1198. https://doi.org/10.1037/a0024194
Bartolomé, R. 2021. Estudio comparativo de los índices de madurez sintáctica entre las generaciones pre y post internet. Círculo de Lingüística Aplicada a la Comunicación, (88), 83-106. https://dx.doi.org/10.5209/clac.78299
Carrera, J., Castellón, I., lloberes, M., Padró, L. & Tinkova, N. (2008). Dependency Grammars in FreeLing. Procesamiento de LC Natural, 41, 21-28.
Chatzipanagiotidis, S., Giagkou, M., & Meurers, D. (2021). Broad linguistic complexity analysis for Greek readability classification. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 48-58).
Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750). https://doi.org/10.3115/v1/D14-1082
Chomsky, N. (1970). Aspectos de la teoría de la sintaxis. Aguilar.
Chomsky, N. (1978). Estructuras sintácticas. Siglo XXI.
Choppin, A. (2000). Pasado y presente de los manuales escolares. En J. Berrio (Ed.), La cultura escolar de Europa. Tendencias históricas emergentes (pp.107-165). Biblioteca Nueva.
Christie, F. (1998). Science and apprenticeship. The pedagogic discourse. En J. Martin & R. Veel (Eds.), Reading science. Critical and functional perspectives on discourse of science (pp. 152-180). Routledge.
Christie, F. (2002). Classroom discourse analysis: A functional perspective. Continuum.
Crespo, N. Alvarado, C. & Meneses, A. (2013). Desarrollo sintáctico: Una medición a partir de la diversidad clausular. Logos. Revista de Lingüística, Filosofía y Literatura, 23(1), 80-101. https://revistas.userena.cl/index.php/logos/article/view/197
Crespo, Alfaro & Góngora (2011). La medición de la sintaxis. Evolución de un concepto. Onomázein, 24(2), 155-172. http://revistaaisthesis.uc.cl/index.php/onom/article/view/30967
Demberg, V., & Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109(2), 193-210. https://doi.org/10.1016/j.cognition.2008.07.008
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-Based Dependency Parsing with Stack Long Short-Term Memory. En Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 334-343). https://doi.org/10.3115/v1/P15-1033
Falkenjack, J., Mühlenbock, K. H., & Jönsson, A. (2013). Features indicating readability in Swedish text. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) (pp. 27-40).
Falkenjack, J., Santini, M., & Jönsson, A. (2016). An exploratory study on genre classification using readability features. En Proceedings of the Sixth Swedish Language Technology Conference (SLTC 2016), Umeå, Sweden. https://doi.org/10.13140/RG.2.2.33356.21120
Fedorenko, E., Woodbury, R. & Gibson, E. (2013). Direct Evidence of Memory Retrieval as a Source of Difficulty in Non-local Dependencies in Language. Cognitive Science, 37, 378-394. https://doi.org/10.1111/cogs.12021
Ferreira, F. (1991). Effects of Length and Syntactic Complexity on Initiation Times for Prepared Utterances. Journal of Memory and Language, 30(2), 2110-2233. https://doi.org/10.1016/0749-596X(91)90004-4
Frantz, R., Starr, L. & Bailey, A. (2015). Syntactic Complexity as an Aspect of Text Complexity. Educational Researcher, 44(7), 387-393. https://doi.org/10.3102/0013189X15603980
Futrell, R., Mahowald, K., & Gibson, E. (2015). Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences, 112(33), 10336-10341. https://doi.org/10.1073/pnas.1502134112
Futrell, R., Wilcox, E., Morita, T., Qian, P., Ballesteros, M., & Levy, R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 32–42, Minneapolis, Minnesota. Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/N19-1004
Futrell, R., Gibson, E. & Levy, R. (2020). Lossy-Context Surprisal: An Information-Theoretic Model of Memory Effects in Sentence Processing. Cognitive Science, 44, 2-54. https://doi.org/10.1111/cogs.12814
Gernsbacher M. A. (1989). Mechanisms that improve referential access. Cognition, 32(2), 99–156. https://doi.org/10.1016/0010-0277(89)90001-2
Gibson, E. (1998). Linguistics complexity: Locality of syntactic dependencies. Cognition, 68(1), 1-76. https://doi.org/10.1016/S0010-0277(98)00034-1
Gibson, E. (2000). The Dependency Locality Theory: A Distance-Based Theory of Linguistics Complexity. En A. Marantz, Y. Miyashita, & W. O´Neil, (Eds.), Image, language, brain: Papers for the first mind articulation project symposium (pp. 95-126). MIT Press.
Gili Gaya, S. (1972). El pretérito de negación implícita. En Studia Hispanica in honorem R. Gredos, tomo I, 251-6.
Givón, T. (1991). Markedness in grammar: distributional, communicative and cognitive correlates of syntactic structures. Studies in Language, 15(2), 335-370. http://dx.doi.org/10.1075/sl.15.2.05giv
Givón, T. (2009). The Genesis of Syntactic Complexity. John Benjamins Publishing Company. http://dx.doi.org/10.1075/z.146
Goldman, S. (1997). Learning from texts: Reflections on the past and suggestions for the future. Discourse Processes, 23, 357-398. http://dx.doi.org/10.1080/01638539709544997
Graesser, A.C., McNamara, D.S., & Kulikowich, J.M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234. https://doi.org/10.3102/0013189X11413260
Graesser, A.C. & McNamara D.S. (2011). Computational analyses of multilevel discourse comprehension. Topics in Cognitive Science, 3, 371-98. https://doi.org/10.1111/j.1756-8765.2010.01081.x
Grodner, D. & Gibson, E. (2005). Consequences of the Serial Nature of Linguistics Input for Sentencial Complexity. Cognitive Science, 29, 261-290. https://doi.org/10.1207/s15516709cog0000_7
Gysling, J. & Meckes, L. (2011). “Estándares de aprendizaje en Chile: mapas de progreso y logro SIMCE 2002 a 2010”, PREAL Serie Documentos Nº 54. Inter-American Dialogue.
Hall, D., Durrett, G., & Klein, D. (2014). Less grammar, more features. En Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 228-237). https://doi.org/10.3115/v1/P14-1022
Hawkins, J. (1990). A Parsing Theory of Word Order Universals. Linguistics Inquiry, 21(2), 223-261. http://www.jstor.org/stable/4178670
Hawkins, J. (1994). A Performance Theory of Order and Constituency. Cambridge University Press. https://doi.org/10.1017/CBO9780511554285
Herrera Lima, M.E. (1991). Madurez sintáctica en escolares de Ciudad de México. Análisis preliminar. En H. López Morales (Ed.), La enseñanza del español como lengua materna (pp. 155-169). Universidad de Puerto Rico.
Hunt, K. W. (1965). Grammatical Structures Written at Three Grade Levels. National Council of Teachers of English Research Report N° 3. National Council of Teachers on English, Urbana.
Hunt, K.W. (1970). Syntactic Maturity in Schoolchildren and Adults. Monographs of the Society for Research in Child Development, 35(1), 1-67.
Hyland, K. (2000). Disciplinary Discourses: Social Interactions in Academic Writing. Longman.
Hyland, K. (2004). Disciplinary Discourses: Social Interactions in Academic Writing. University of Michigan Press. https://doi.org/10.3998/mpub.6719
Ibáñez, R., Moncada, F., Cornejo, F., & Arriaza, V. (2017). Los Géneros del Conocimiento en Textos Escolares de Educación Primaria. Calidoscopio, 15(1), 462-476. http://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2017.153.06
Ibáñez, R., Moncada, F., & Arriaza, V. (2018). Recontextualización del conocimiento en textos escolares chilenos. Revista Signos. Estudios de Lingüística, 51(98), 430-456. http://dx.doi.org/10.4067/S0718-09342018000300430
Ibáñez, R., Moncada, F., & Cárcamo, B. (2019). Coherence Relations in Primary School Texts Books: Variation Across School Subjects. Discourse Processes, 56, 764-785. https://doi.org/10.1080/0163853X.2019.1565278
Kasule, D. (2011). Textbook Readability and ESL Learner. Reading and Writing, 2, 63-76. http://dx.doi.org/10.4102/rw.v2i1.13
King, J. & Just, M. (1991). Individual Differences in Syntactic Processing: The Role of Working Memory. Journal of Memory and Language, 30, 580-602. https://doi.org/10.1016/0749-596X(91)90027-H
Kitaev, N., & Klein, D. (2018). Constituency Parsing with a Self-Attentive Encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2676-2686). http://dx.doi.org/10.18653/v1/P18-1249
Kleijn, S. (2018). Clozing in on readability: how linguistics features affect and predict text comprehension and on-line processing. LOT Publications.
Koo, T., Carreras, X., & Collins, M. (2008). Simple semi-supervised dependency parsing. In Proceedings of ACL-08: HLT (pp. 595-603).
Kübler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis lectures on human language technologies, 1(1), 1-127. https://doi.org/10.2200/S00169ED1V01Y200901HLT002
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10(8), 707-710.
Levy, R. & Keller, F. (2013). Expectation and locality effects in German verb-final structures. Journal of Memory and Language, 68(2), 199-222. https://doi.org/10.1016/j.jml.2012.02.005
Lewis, R. & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29, 375-419. https://doi.org/10.1207/s15516709cog0000_25
Li, H., Zhang, Z., Ju, Y., & Zhao, H. (2018). Neural character-level dependency parsing for Chinese. En Thirty-Second AAAI Conference on Artificial Intelligence. Disponible en https://ojs.aaai.org/index.php/AAAI/article/view/12002
Liu, H. (2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9(2), 159-191. http://dx.doi.org/10.17791/jcs.2008.9.2.159
Marcheggiani, D., Frolov, A., & Titov, I. (2017). A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling. En Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017) (pp. 411-420). http://dx.doi.org/10.18653/v1/K17-1041
Martin, J. & Rose, D. (2008). Genre Relations: Mapping Culture. Equinox.
Martin, J. & Rose, D. (2013). Pedagogic Discourse: Contexts of Schooling. RASK: International Journal of Language and Communication, 38, 219-264.
McNamara, D. S. & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22(3), 247–288. https://doi.org/10.1080/01638539609544975
McNamara, D. S. (2004). Aprender del texto: Efectos de la estructura textual y las estrategias del lector. Revista Signos. Estudios de Lingüística, 37(55), 19-30. http://dx.doi.org/10.4067/S0718-09342004005500002
McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014) Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511894664
MINEDUC (2015). Bases Curriculares. 7mo básico a 2do medio. Disponible en https://www.curriculumnacional.cl/portal/Documentos-Curriculares/.
MINEDUC (2018a). Bases Curriculares. Primero a Sexto Básico. Disponible en https://www.curriculumnacional.cl/portal/Documentos-Curriculares/.
MINEDUC (2018b). ¿Qué debemos saber sobre los textos escolares? Disponible en https://www.supereduc.cl/contenidos-de-interes/que-debemos-saber-sobre-los textosescolares
Möenne, G. & López, L. (2007). Oportunidades que ofrecen las TICs como apoyo a los textos escolares. En Primer Seminario Internacional de textos escolares SITE 2006. Disponible en https://bibliotecadigital.mineduc.cl
Nicemboin, B., Vasishth, S., Gattei, C., Sigman, M. & Kliegl, R. (2015). Working memory differences in long-distance dependency resolution. Frontiers in Psychology, 6, 2-16. https://doi.org/10.3389/fpsyg.2015.00312
Nir, B. & Berman, R. (2010). Complex syntax as a window on contrastive rhetoric. Journal of Pragmatics, 42(3), 744-765. https://doi.org/10.1016/j.pragma.2009.07.006
Nivre, J., & McDonald, R. (2008). Integrating graph-based and transition-based dependency parsers. In Proceedings of ACL-08: HLT (pp. 950-958).
Olivares, P. (2007). Concepto de nación e identidad nacional: Una approache a través de las políticas educativas y de la enseñanza de la Historia de Chile (Siglos XIX-XX). En MINEDUC (Ed.), Acta del Primer Seminario Internacional de Textos Escolares (pp.161-165). Mineduc-UNESCO.
Oteiza, T. (2009). Cómo es presentada la historia contemporánea en los libros de textos chilenos para la escuela media. Discurso & Sociedad, 3(1), 150-174.
Oya, M. (2011). Syntactic dependency distance as sentence complexity measure. En Proceedings of the 16th International Conference of Pan-Pacific Association of Applied Linguistics (pp. 313-316).
Peñaloza, C., Araya, C. & Coloma, C.J. (2017). Desarrollo de la complejidad sintáctica en recontados narrativos de niños preescolares y escolares. Logos. Revista de Lingüística, Filosofía y Literatura, 27(2), 333-348. https://doi.org/10.15443/RL2726.
Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. En Proceedings of the ninth workshop on innovative use of NLP for building educational applications (pp. 174-184). http://dx.doi.org/10.3115/v1/W14-1821
Poulsen, M. & Gravgaard, A. (2016). Who did what to whom? The relationship between syntactic aspects of sentence comprehension and text comprehension. Scientific Studies of Reading, 20(4), 325-338. http://dx.doi.org/10.1080/10888438.2016.1180695
Ramírez, T. (2002). El Texto Escolar como Objeto de Reflexión e Investigación. Docencia Universitaria, 3(1), 101-124.
Rickford, J., Denton, M., Wasow, T. & Espinoza, J. (1995). Syntactic Variation and Change in Progress: Loss of the Verbal Coda in Topic-Restricting As Far As Constructions. Language, 71(1), 102-131. https://doi.org/10.2307/415964
Rodríguez Fonseca, L. (1991). Índices de madurez sintáctica en escolares puertorriqueños de escuela primaria. En H. López Morales (Ed.), La enseñanza del español como lengua materna (pp. 133-143). Universidad de Puerto Rico.
Rojas, D., Ibáñez, R., Moncada, F., & Santana, A. (2020). Los Géneros del Conocimiento en el texto escolar de Lenguaje y Comunicación: Un análisis semiautomático de su lecturabilidad. RLA. Revista de Lingüística Teórica y Aplicada, 58(2), 41-67. https://doi.org/10.29393/RLA58-14GCDR40014
Rojas, D. (2021). Efecto de la complejidad sintáctica en la comprensión de estudiantes de Octavo Año Básico: una aproximación a la lecturabilidad del Texto Escolar. [Tesis de Magíster, Pontificia Universidad Católica de Valparaíso]. http://repositorio.conicyt.cl/bitstream/handle/10533/249935/Tesis_Rojas_Villarroel.pdf?sequence=1
Rose, D. (2014). Analyzing pedagogic discourse: An Approach from genre and register. Functional Linguistics, 1, 11. https://doi.org/10.1186/s40554-014-0011-4
Schleppegrell, M. J. (2004). The Language of Schooling. Lawrence Erlbaum.
Sedano, M. (2011). Manual de gramática del español, con especial referencia al español de Venezuela. Consejo de Desarrollo Científico y Humanístico, Universidad Central de Venezuela.
Sánchez, V. & De Mier, V. (2017). Syntactic Complexity in Narratives Written by Spanish Heritage Speakers. Vigo International Journal of Applied Linguistics, (14), 125-148.
Santana, A. Ibáñez, R. Moncada, F. & Zamora, J. (2021). Causal Connective Expressions in Textbooks written in Spanish: a comparative study of four primary school subjects. Journal of Pragmatics, 182, 104-117. https://doi.org/10.1016/j.pragma.2021.06.010
Staub, A. (2010). Eye movements and processing difficulty in object relative clauses. Cognition, 116, 71-86. https://doi.org/10.1016/j.cognition.2010.04.002
Tesnière, L. (2015). Elements of structural syntax. John Benjamins Publishing Company. https://doi.org/10.1075/z.185
Vasishth, S. & Drenhaus, H. (2011). Locality in German. Dialogue Discourse, 2, 59-82.
Vásquez, I. (1991). Índices de madurez sintáctica en estudiantes puertorriqueños de escuela superior. En H. López Morales (Ed.), La enseñanza del español como lengua materna (pp. 145-153). Universidad de Puerto Rico.
Véliz, M. 1988. Evaluación de la madurez sintáctica en el discurso escrito. Revista de Lingüística Teórica y Aplicada, 26, 105-141.
Véliz, M. 1999. Complejidad Sintáctica y modo del discurso. Revista de Estudios Filológicos, 34, 181-192. http://dx.doi.org/10.4067/S0071-17131999003400013
Watanabe, T., & Sumita, E. (2015). Transition-based neural constituent parsing. En Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1169-1179). http://dx.doi.org/10.3115/v1/P15-1113
Wheelahan, L. (2010). Why knowledge matters in curriculum. Roudledge. https://doi.org/10.4324/9780203860236
Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In Proceedings of the eighth international conference on parsing technologies (pp. 195-206).
Article download
License
In order to support the global exchange of knowledge, the journal Círculo de Lingüística Aplicada a la Comunicación is allowing unrestricted access to its content as from its publication in this electronic edition, and as such it is an open-access journal. The originals published in this journal are the property of the Complutense University of Madrid and any reproduction thereof in full or in part must cite the source. All content is distributed under a Creative Commons Attribution 4.0 use and distribution licence (CC BY 4.0). This circumstance must be expressly stated in these terms where necessary. You can view the summary and the complete legal text of the licence.