TOWARDS THE COMPUTATIONAL IMPLEMENTATION OF ROLE AND REFERENCE GRAMMAR : RULES FOR THE SYNTACTIC PARSING OF RRG PHRASAL

The goal of this work is twofold: on the one hand, it seeks to update the description of phrasal constituents in Role and Reference Grammar; on the other, it aims at providing a computational implementation of such structures within the Grammar Development Environment, a component of ARTEMIS (“Automatically Representing Text Meaning via an Interlingua-Based System”), a Natural Language Processing prototype developed with the aim of binding natural language fragments with their corresponding grammatical and semantic structures. In order to attain both tasks the analysis focuses specifically on the design of the rules that would account for the linkage between the syntactic and the semantic cortés-rodríguez: rrg phrasal constituents 76 representations of Referential Phrases, as proposed initially in RRG. This proposal involves a reinterpretation of the constituents and operators in the Layered Structure of the RP taking into account that in ARTEMIS the assignment of a syntactic-semantic structure to a given sequence is based on the activation of grammatical rules plus a set of Attribute Value Matrixes related to the grammatical features of the constituents of RPs.


Introduction
A growing concern for many grammatical models, especially functionally oriented theories, is the development of the conditions to satisfy what may be called the criterion of computational adequacy. 1In line with this interest, some recent contributions (Guest 2009;Salem, Hensman and Nolan 2008;Diedrichsen 2014) within Role and Reference Grammar (henceforth RRG; Van Valin and LaPolla 1997;Van Valin 2005;Pavey 2010) 1 This work has been developed within the framework of the research project "Desarrollo de plantillas léxicas y de construcciones gramaticales en inglés y español.Aplicación en los sistemas de recuperación de la información en entornos multilingües" (FF12011-29798-C02-02), funded by the Spanish Ministry of Science.
clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 77 have been devoted to the development of this issue.However, it is still a challenge to devise a fully-fledged syntax-semantics interface which is computationally implemented.
Works like Van Valin and Mairal Usón (2013) or Periñán-Pascual (2013, 2014) draw the guidelines of a system of natural language understanding -ARTEMIS-based on the theoretical tenets of this grammar.ARTEMIS ("Automatically Representing Text Meaning via an Interlingua-Based System") has been implemented as a parsing device within FungramKB ('Functional Grammar Knowledge base") for the computational treatment of the syntax and semantics of sentences.
The goal of this work is twofold: on the one hand, it seeks to update the description of phrasal constituents in RRG, taking into account Van Valin's (2008) programmatic proposal, which brings about some significant variations in the interpretation of phrasal constituents as they were originally described in Van Valin and LaPolla (1997) and Van Valin (2005); on the other, it aims at providing a computational implementation of such structures within the Grammar Development Environment (GDE), a component of ARTEMIS whose task is to provide an effective parsing of morphosyntactic structures.
Despite the fact that this NLP prototype is based on the theoretical tenets of RRG (Van Valin and Mairal Usón 2013;Periñán-Pascual 2014, Periñán-Pascual andArcas-Túnez 2014), a reinterpretation of some of the constituents and operators in the Layered Structure of Referential Phrases (henceforth RPs) is necessary in order to comply with the requirements of the syntactic parser, taking into account that in ARTEMIS the assignment of a syntactic-semantic structure to a given sequence is based on the activation of grammatical rules plus a set of Attribute Value Matrixes (AVMs) related to the grammatical features of the constituents of RPs.
The remainder of the paper is organized as follows: Section 2 offers a brief description of the interrelation among RRG, FunGramKB and ARTEMIS highlighting both the points of consensus and the adjustments required for an effective integration of RRG functional syntax into the Grammar Development Environment (GDE) which will perform the parsing operations within ARTEMIS.Section 3 updates the syntactic analysis of phrasal constituents in RRG by fully developing the programmatic proposal put forward in Van Valin (2008), which starts from a radical substitution of the analysis based on formal categories such as NP or AdjP for a functionally oriented description in clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 78 terms of the categories Referential Phrase (RP) and Modifier Phrase (MP).In section 4 a description of the elements necessary for the computational parsing of these new categories within the GDE is proposed.After a conclusion section and the list of references, the paper is rounded off with an appendix including a list of the abbreviations that have been used in the parsing rules proposed and in the AVMs.

Artemis, RRG and FunGramKB
As stated above, ARTEMIS is an NLP prototype designed primarily for natural language understanding.One crucial differentiating feature with regard to other NLP systems with similar tasks is the fact that ARTEMIS seeks to be linguistically motivated.This involves adopting a linguistic theory, RRG, as a foundational pillar for the construction of the components of the NLP system.Periñán-Pascual (2013: 209) explains some features of RRG that make it suitable for NLP: RRG is a model where morphosyntactic structures and grammatical rules are explained in relation to their semantic and communicative functions.
(b) RRG is a monostratal theory, where the syntactic and semantic components are directly connected through a bidirectional "linking algorithm".
(c) RRG is a model that makes strong claims to typological adequacy.
However, given the conditions imposed by NLP environments, a direct computational "translation" of RRG's grammatical structures and rules is impossible, and some finetuning becomes necessary.These adjustments involve, on the one hand, resorting to other models that complement RRG in an area where this grammar is still  and Arcas-Túnez 2004, 2005, 2007, 2008, Periñán-Pascual and Mairal Usón 2009, 2010,2011), which in turn also exploits constructional schemata as a system for meaning representation.
clac 65/2016, 75-108 FunGramKB includes the following interrelated modules: (a) The lexical component, which is language-specific and consists of two submodules: the lexicon (which includes in the format of entries all the linguistic information related to the lexical units) and the morphicon (which deals with all inflectional processes of a language).

(b)
The grammatical level, also language dependent where constructional schemata of a given language are stored.Such schemata are organized according to the 4 layer classification proposed within the LCM.

(c)
The Conceptual component is language independent and stores all deep semantic units and structures into different submodules: the ontology (a hierarchical storehouse for concepts in a human mind), the cognicon (or repository of procedural conceptual schemas or scripts to encode stereotypical actions) and the onomasticon (for real world entities and events).
One pivotal element that binds FunGramKB with RRG and the LCM is the integration of the LCM's constructional templates and RRG's lexical representations (logical structures) into the knowledge base's language-independent formalism for text meaning Although the CLS brings a heavier conceptual load into semantic representations, it needs still some refining from a computational perspective.In fact, for reasoning purposes it is necessary to model CLS representations into COREL structures 2 .Thus the CLS in ( 2) is modeled into a COREL scheme of the following type: In simple terms, the process involved in understanding a stretch of natural language with the tools that have been described is summarized in the following figure: The main goal of ARTEMIS is to deal with all the phases of this process; consequently, it consists of the following modules: The Grammar Development Environment (GDE), the CLS Constructor and the COREL-Scheme Builder.As can be inferred from the previous description of examples (1-3), the two last modules are designed to transfer the shallow semantic representations of sentences into conceptually deeper structures amenable for their processing in different PLN tasks.However, prior to this it is necessary to make an effective computational parsing of the morphosyntactic structure underlying sentences based on the principles of RRG for grammatical descriptions; this is the goal of the GDE.
The analysis carried out in this paper seeks to enrich the information that is required for an effective parsing of the so-called RPs in RRG, and which would become a part of the GDE.So far, the GDE consists of two basic types of theoretical constructs, a set of rules that account for syntactic structures and a library of Attribute-Value Matrixes (AVMs) for grammatical units.The inclusion of unification devices in the prototype also leads to a fundamental adjustment of the original RRG syntactic analysis: the so-called operator projection (operators are the grammatical categories such as tense, mood, aspect, etc. which modify the layers of the clause) is overridden mainly by the AVMs, where the grammatical features that modify the different categories are lodged.Thus, the original syntactic pattern (including constituent and operator projections) for simple clauses in RRG: 3 AVMs are encoded in XML format, similar to that of other platforms for the analysis of human language data, as is NLTK (Natural Language Toolkit; Bird, Loper and Klein 2009).See the next section, for a detailed explanation of the information encoded in this AVM.
clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 82 (5) will look differently once the operator projection is substituted by feature-bearing nodes in the constituent projection, as partially reflected in the tree diagram in (6) 4 : 4 The diagram is obviously incomplete, since neither all the nodes are endowed with their corresponding AVMs nor the AVMs which are represented include all its possible attributes; they just represent some of the 'prototype syntactic rules' for clauses proposed as preliminary within ARTEMIS (the so-called 'version 1.0' rules).Further research on sentences and other syntactic structures is necessary to fully develop the set of necessary rules within the GDE.
clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 83 3. The scope of analysis: description of grammatical processes and its treatment in RRG Van Valin (2008) introduces some important variations in the interpretation of RRG syntactic categories as originally described in both Van Valin and LaPolla (1997) and Van Valin (2005).Such changes are based on the assumption that the most significant syntactic categories are not projections of a lexical head, but projections of the functional status of constituents within the clause.This is already captured in the grammatical components of clauses in RRG, as evidenced by the existence of categories such as Nucleus (NUC), CORE and Periphery (PER).However, at the level of phrasal categories there was still a certain inconsistency, as labels such as NP or AdjP are not functionally motivated.Thence two types of constituents are proposed to account for a functional and typologically valid treatment of such lexical projections 5 : Referential Phrases (RPs) and Modifier Phrases (MPs).From this it follows that the description provided in Van Valin and LaPolla (1997: 52-67) and Van Valin (2005: 21-30) for NPs must be adapted now to RPs, and that the layered structure assigned to NPs is now attributable to RPs with the necessary adaptations.
The remainder of this section will then offer a description of what we may call the Layered Structure of Referential Phrases (LSRP), and which will replace Van Valin and LaPolla's (1997) and Van Valin's (2005) Layered Structure of Noun Phrases.
The fact that RPs are described in terms of a layered structure in RRG is due to their remarkable structural equivalence with clauses.The most striking similarity is that nouns (and adjectives) can take arguments, as verbs in clauses do.This is especially relevant in the case of derived nominals as in the destruction of the city by the earthquake and of the relational nouns like friend or relative, commonly followed by PPs introduced by of which can be considered as their direct arguments, following Nunes (1993).It is important to highlight that, in accordance with Van Valin (2008: 167) the parallelism between RPs and clauses increases because the former need not be endocentric categories; i.e. there is no restriction for RPs to be headed exclusively by any specific lexical category.The same as there is a strong tendency, but not an absolute correlation, for verbs to be the nuclei of clauses, there is a strong tendency for nouns to be the nuclei of RPs, but it is not always the 5 With regard to the typological aspects, Van Valin (2008) also proves the fact that there are no universally valid typological categories beyond the fundamental categories of noun and verb.Furthermore, not all languages align consistently morpholexical categories with functional distinctions.For example, in Nootka (cf.  it is feasible to align nouns with a predicational function and verbs with a referring function, as happens in the Nootka sentence in example (7).
clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 84 case that there is a nominal nucleus.Van Valin (2008: 163-164)  Or even Spanish: (9) Los ricos también lloran The.M.Pl.rich.M.Pl.also cry ('The rich also cry') A preliminary general scheme of the LSRP is given in the following figure 6 : Figure 1 shows that each of the layers in RPs can be modified by different operators.
The whole set of operators in the LSRP is given in the following As in the case of clauses, the ordering of operators is constrained and responds iconically to their respective scopes (cf.Rijkhoff 1990Rijkhoff , 1992)); there is also a periphery for each layer, as happens in clauses as well.The nuclear periphery would host restrictive MPs and restrictive relative clauses 7 : (12) My dear old wood hammer that never lets me down The core periphery would include setting PPs and MPs as in New York City in example (11) above, and non-restrictive modifiers (relative clauses and appositional phrases) will occupy the RPs periphery as is the RP a cupcake expert in Rebeca, a cupcake expert.
Example (11) also reveals that PPs can occupy two different positions within the LSRP, since they can be function within the CORE as a kind of exocentric RP, behaving as an argument of the NUC; or they can be treated as peripheral elements, thus having a predicative setting function.
7 AdjPs are probably the category that has undergone the most significant re-analysis in RRG; initially they were considered nuclear operators, then in Van Valin (2005) were integrated in the constituent projection of NPs as peripheral modifiers in order to provide them with a status similar to the rest of lexical categories, and finally Van Valin (2008)  With regard to Modifier Phrases (MPs), they also have an internal layered structure.Van Valin (2008: 172) gives two reasons to support this: firstly, there are languages with MPs that have as a nucleus an Adjective that takes a core argument.Van Valin (2008: 172) mentions German as one of these languages (e.g.der auf seinen Sohn stolze Vater 'the of his son proud father'), and states that English does not allow such type of structures (*the proud of his son father); however, this impossibility occurs only in the case of attributive adjectival MPs, but it is possible to have MPs with a core argument if they appear within an RP in a post nuclear position; for example: a speech redolent of old-aged morality.Secondly, the modifiers in MPs can themselves be modified; this involves that the MP must make room for a periphery to house the modifying MP; e.g.

a very brightly debated proposal.
There is an especially relevant slot in RPs, the so-called 'RP Initial Position' (RPIP) that exhibits features similar to those of both the detached (f.i. it allows Genitive MPs as in Today's class) and the precore slot positions of clauses (wh-words can also occupy this position, as in which evening dress) 8 .The RPIP plays a role both in the constituent and 8 There is also an RP Final position, which is relevant for other languages; for example in the RP Wówapi ki lé 'book the this' from Lakhota the demonstrative is in RPFP.Something similar occurs to the article in Mparntwe Arrenrnte, as in kngwelye nhenhe re 'dog this 3SGDEF'. 9However, the same does not hold either for the structures above the clause and below the sentence (note that there isn`t any rule to account for precore positions or detached constituents).AUX in initial position helps to explain the structure of yes/no questions and imperative clauses (Do come here!Don't talk!) 11 .This is the reason why the value 'decl' for the Illocutionary force attribute is not available in the AVM of the node AUX.
Another interesting variation affects the Attributes associated to the different nodes involved in the previous rules.This must be done in order to capture the whole range of operators that affect the different layers in syntactic representations.For example, the CL node will now have the following AVM: 11 For a detailed description of the role of AUX items in the GDE, cf.Díaz Galán and Fumero Pérez (2015), which also offers a first description of some rules and AVMs for the syntactic parsing within ARTEMIS of clause-level grammatical operations involving do-auxiliary insertion.The following significant changes to be introduced in ARTEMIS repository of syntactic rules would be the substitution of the node tags for the categorial phrases NP, AdvP and AdjP AVMS for the corresponding RP and MP tags; this would account for the 'external syntax' of RPs and MPs.These changes affect the so-called "argumental", "nuclear" and "peripheral" levels (levels 4, 3 and 2 in version 1.0).Strictly speaking, the nodes ARG and ADJUNCT are no longer used in the latest version of RRG, but the analysis of their usefulness in ARTEMIS lies beyond the scope of this paper.Suffice by now to consider that their preservation helps to keep separate the AVMs for these constituents and their corresponding attributes as members of the clause (i.e. the kind of thematic role they fulfill) from those pertaining to their internal configuration at phrase level (as is for example the definiteness of RPs).In keeping with this line of reasoning, the attribute of number has been erased from the AVMs of ARGs and ADJUNCTs.The PP in the museum has a predicative nucleus, the preposition in, which licenses its object.In these cases, the PP has a layered structure.For example, the PP right behind the house is analyzed in the following way: Compare it with the non-predicative PP to my friend: (28) The first subset of rules above captures both types of PPs, especially the complex layering of predicative PPs.
The second subset of rules in the layer of phrasal constituents is devoted to spelling out the internal configuration of RPs in English: ( The first of these rules establishes a distinction between two types of RPs: those that lack an internal layering, as happens with Pronouns when they appear alone and with Proper nouns, and the more complex ones, those in which it is necessary to distinguish three types of daughter nodes: the CORE-RP, the RPIP and the PER-RP.As it expressed in the rule, only the CORE-RP is not optional. It has been already described in the previous section that the LSRP is similar to the LSC in allowing a periphery for each of its layers.The uppermost PER node (PER-RP) can be occupied by non-restrictive modifiers, as are appositional phrases and some relative The RPIP marks the definiteness of the RP; therefore it hosts the central determiners, including here the articles, demonstratives and possessives 13 ; central determiners may in turn be modified by a group of partitive predeterminers like all, both, double, half, twice, many (a, such (a), what (a).RPIP is also the position for genitive RPs and MPs (Yesterday's news), which are also intrinsically definite.Note that if a possessed RP is indefinite, it will appear as a peripheral post nuclear PP within the Core MP (as in a son of Peter's).
The following rule in this subset explains the internal structure of CORE-RP: Apart from the Nucleus, and the peripheral elements it includes the possibility of having one, two or three arguments.This seems to be the maximum of arguments allowed and it happens with some derived nominals as NUC-RP : (39) The donation of 100.000 dollars to the asylum by an anonymous benefactor ARGs in RPs can be PP, as in the example above, or CL: 13 The difference between possessives and demonstratives that can occur in RPIPs and those that are RPs (42) The convention of one Philippine tribe that no man can keep a secret (example from Downing and Locke (1992: 463).
The proposal for new syntactic categories in Van Valin (2008)  in My little brother is afraid of the dark; on the contrary, attributive adjectives cannot take arguments in English (*My afraid of the dark little brother).In fact, in Van Valin and LaPolla (1997: 69) this was taken as an argument to defend that they did not have a layered structure.Van Valin (2008: 172), based on Matasović (2001), corrects this view as there are languages which do have attributive adjectives taking an argument.This has led us to introduce a disjunctive option in the rule for CORE-MP, even though there are no core arguments for the attributive NUC MP in English 14 .
14 Since ARTEMIS is primarily a left-to-right and bottom-up with top-down prediction parser there is no danger of a wrong assignment of attributive NUC + argument MP.
clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 101 Note that the rules as stated above are not complete since the nodes do not have their corresponding AVMs.This has been done to ease the explanation about the syntactic configuration of RPs, MPs and PPs.Let us turn now to consider the AVMs for the relevant constituents within such rules; some of them were already included in the original AVMs list within ARTEMIS, and have only been subject to certain The only new Attributes that have been introduced in the above AVMS are "Pol", which refers to the 'Polarity' of some quantifiers like any, some, no and its compound derivatives, "Quant" which stands for "quantification" and is also an attribute of quantifier words, and "Case" which is a cancellable attribute for RPs and percolates up from nominal heads.Quantification can be relative (few, many, little) or absolute (all, no), and positive (many) or negative (few) in terms of quantity.Quantifiers interact in very interesting ways with sentential negation and no has a especially relevant role as the exponent of RP negation in English (no books).The corresponding AVMs for these Attributes will be: The same must be done with the rest of the rules for XPs proposed in this paper.

Conclusion
As stated in the introduction, the analysis in this paper aimed at contributing to the development of the GDE within a functionally based parsing prototype, ARTEMIS.In order to do so, it has been necessary to previously offer an updated description of phrasal units in Role and Reference Grammar.Consequently, this study also contributes to the computational implementation of this linguistic theory.There are, however, several pending issues to achieve these goals, among which the following must be considered: (i) the revision of the lexical rules associated to the functional units that are encoded as POS tags in the repository for lexical items within ARTEMIS.Several of these tags are used in the rules proposed for XPs in this paper; (ii) AVMs need significant revision, subject to the development of further analysis both from within and without phrasal constituents; and (iii) the prototype rules for the different layers that make up the so-called LSC in RRG needs thorough refinement.
underdeveloped.I am referring to semantic representations; in fact, the contributions within the Lexical Constructional Model (LCM; Ruíz de Mendoza and Mairal Usón 2008; Mairal Usón and Ruíz de Mendoza 2008, 2009) have become significantly relevant to develop a fully-fledged system of semantic representations.On the other hand, ARTEMIS deploys FunGramKB knowledge base (Periñán-Pascual representation, namely Conceptual Logical Structures (CLS).CLS takes as a backbone for semantic representation the Aktionsart characterization of lexical units as encoded in the Logical Structures of RGG.To show the differences between both types of constructs, Periñán-Pascual (2013: 218) offers the semantic representation of the sentence "The juice froze in the refrigerator" first in its RRG version: (1) < IF DEC < TNS PAST 〈 be-in' (refrigerator, [[do' (juice, [freeze' (juice)])] CAUSE [BECOME black' (juice)]) >> and next, the corresponding FunGramKB CLS: changes between (1) and (2) (Periñán-Pascual 2013: 218-219), among which the most relevant is the substitution of predicates by concepts from FunGramKB's ontology (as are +JUICE_00 and +BLACK_00 in the above example).clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 80

Table 1 -
Operators in the LSRP Nunes (1993)shows that NPs have one direct core argument, which is marked by the preposition of.Therefore, of is non-predicative in these structures and it is semantically empty; this is proved by the fact that it occurs with a whole range of semantic functions, aligned them as MPs together with adverbial phrases.clac 65/2016, 75-108 cortés-rodríguez: rrg phrasal constituents 87