PERCEPTION OF VOICING IN ENGLISH FRICATIVES BY SPANISH LISTENERS

Speech contrasts which are not functionally distinctive in the mother tongue are difficult to differentiate perceptually for non-native listeners. A two-fold experiment was carried out in order to investigate the L2 perception by 26 Spanish listeners of English, with an English group used as a control. In the present paper, listeners were exposed to the English contrast /s/ vs. /z/, which is not phonemic in Spanish, in two synthesized words, Sue and zoo. The results showed that there were listeners who had already acquired the contrast with some language-specific differences in the use of some of the acoustic cues that signalled this contrast. Furthermore there was no correlation between the results obtained and the variables age of L2 learning, length of L2 learning, or listener’s age. Final conclusions will be drawn from this research and the previous work devoted to place of articulation differences.


Introduction
This paper presents the second part of a two-fold investigation on Spanish perception of non-native contrasts, in particular, how Spanish speakers perceive English contrasts that are not functionally distinct in the mother tongue, or which are phonemic in the native language but differ in their phonetic realisation in the L2.
In the previous part of this research (Barreiro, 2015), an experiment was carried out to analyse the perception of fricative place of articulation using /s/ vs. /ʃ/, a sibilant phonemic contrast in English but not in Spanish since the palato-alveolar is not present in the Spanish phonemic system. The results showed that perceptual cue weighting was language-dependent. In particular, the spectral properties of the frication noise were more powerful acoustic cues than the information provided by the formant transitions, especially for the Spanish group. Furthermore, there was great variation among individual Spanish listeners, which was not related to length of L2 learning, age of L2 learning or age of the listener, some of the personal factors that seemed to interact with the difficulty in perceiving non-native contrasts (Purcell & Suter, 1980;Bohn, 1995;MacKain et at., 1981;Bohn & Flege, 1990;Polka, 1991;Mayo et al., 1997;Flege, 1998;or Bradlow & Pisoni, 1999;Escudero & Boersma, 2004;Hall et al., 2004, Cebrian, 2009Derwing & Munro, 2013; to mention a few).
The research that is presented here examines voicing distinctions in sibilant fricatives. It focuses on the acoustic cues contained in the signal which can be used by the listeners to identify a phoneme category as well as to apprehend the existence of a phonological contrast (Polka, 1991;Wright, 2001Wright, & 2004Kabak & Maniwa, 2007;McMurray & Jongman, 2011 The presence of simultaneous cues (or cue multiplicity) in the signal implies redundant information, very useful in those cases when the signal is degraded as a result of the noise environment, or even more relevant, in L2 speech perception. Furthermore, cue multiplicity in L2 speech perception is closely related to the issue of the perceptual weighting given to the information provided by the acoustic cues when identifying a phoneme category. It seems clear from previous research studies that L2 learners weigh the information available in the signal differently from the way L1 learners do when making a linguistic distinction (e.g. Strange and Jenkins, 1978;Underbakke et al., 1988;Yamada et al., 1992;Flege, 1984;Hazan & Boulakia, 1993;Bohn, 1995;or Williams, 1977, McCasland, 1983, and Flege & Eefting, 1987, Broersma, 2005, Barreiro, 2014 Spanish studies). This is an issue that we will also be tackling here.
In terms of voicing, English fricatives are classified into voiceless and voiced (/f, θ, s, ʃ, h/ vs. /v, ð, z, ʒ/) according to the presence of phonation, determined by the glottal pulsing (or excitation) that produces energy primarily at low frequencies (below 750 kHz). This acoustic parameter has traditionally been considered the most salient cue for voicing perception. Studies focusing on the role played by the glottal vibration to make voicing distinctions between fricatives have shown that the glottal pulsing is extended over a longer period for the voiced than for the voiceless fricatives (Hughes & Halle, 1956;Ingerman, 1960;Glicksman & Stevens, 1988;Stevens et al., 1992;Pirello et al., 1997;Maniwa et al., 2008, to mention a few).
Nevertheless, it is well known that in both initial and final positions voiced fricatives are not fully voiced, especially in the latter position (Stevens et al., 1992;Smith, 1997;Jesus & Shadle, 2002& 2003Dmitrieva 2014). Moreover, as the fricative place of articulation moves further back, the amount of devoicing increases proportionally (Jesus and Shadle, 2003): Alveolar fricatives appear to have a lesser preponderance of voicing throughout than labiodental fricatives (Pirello et al., 1997), which have been reported to remain fully voiced more often (Stevens et al., 1992). Therefore, if voicing in fricatives cannot only be characterized by the presence or absence of the glottal excitation, a complementary way of establishing differences between the so-called voiced fricatives and their voiceless counterparts needs to be identified. clac 69/2017, 33-66 barreiro: fricatives 38 Numerous studies prior to this have shown that this acoustic property tends to overlap with many other amplitudinal and, very especially, temporal cues (Maniwa et al., 2008;Raphael et al., 2011). Let us analyse this in detail.
The differences in overall frication duration (also called the normalized duration) between voiceless (or unvoiced) and voiced fricatives have been thoroughly studied in previous works. Although the study by Crystal & House (1988a) claimed no correlation between voicing and duration in fricatives, voiceless fricatives at a given place of articulation have been reported to be significantly longer than their voiced counterparts (Klatt, 1976;d Manrique & Massone, 1981;Álvarez González, 1982;Docherty, 1992;Stevens et al., 1992;Pirello et al., 1997;Smith, 1997;Jesus & Shadle, 2003) as "the maintenance of voicing during the articulation of a fricative constriction results in later onset and earlier offset" (Silbert & de Jong, 2008: 2770. In fact, when the overall frication duration is shortened, there is a clear perceptual shift from voiceless to voiced (Flege & Hillenbrand, 1986;Jongman et al., 2000), which appears to be more pronounced for the non-sibilants than for the sibilants (Harris, 1958;Jongman et al., 2000). This will also be analysed here.
With regard to the interaction of the overall frication duration with context variables, it has been pointed out that the fricative position within the word can also have an effect on fricative perception, both in isolated syllables and in running speech (Crystal and House, 1988a). According to some researchers, it is particularly significant for the voicing distinction in syllable-initial positions, with voiceless fricatives having longer noise durations than voiced fricatives (Álvarez González, 1982;Behrens & Blumstein, 1988;Baum & Blumstein, 1987;& 2009. Other works, however, claim that syllable affiliation (onset versus coda position) does not correspond to reliable frication duration differences (Gordon, 1989;Pirello et al., 1997;Silbert & de Jong, 2008). Therefore, the, not uncommon overlapping of the mean overall frication duration values between voiced and voiceless fricatives, in many cases has led to the vowel duration and the relative duration of the vocalic and fricative segments being analysed as other possible temporal acoustic cues in relation to the listener's ability of making voicing contrasts in fricatives (Denes, 1955;Harris, 1958;House, 1961;Raphael, 1972;Cole & clac 69/2017, 33-66 barreiro: fricatives 39 Cooper, 1975Soli, 1981;Hogan & Rozsypal, 1980;Baum & Blumstein, 1987;Crystal & House, 1988a& 1988bGlicksman & Stevens, 1988;Jongman, 1985Jongman, & 1989Gordon, 1989;Broersman, 2010;Raphael et al. 2011;and McMurray et al. 2011).
According to these research works, when the adjacent vowel is relatively long the consonant is usually perceived as voiced whereas when the vowel is relatively short, it is perceived as voiceless. Nevertheless, although voicing perception seems to increase as fricative duration decreases in relation to the adjacent vowel, the relative weighting of this cue (vowel context) depends on the context (Wilde, 1993(Wilde, & 1995& 2009. In particular, the duration of the vowel has a strong effect on fricative voicing perception in syllable-final position (Denes, 1955;Álvarez González, 1980;Raphael, 1972Raphael, , 1981Maxwell & Weismer, 1982;Smith, 1997;Broersma, 2010), and a weak effect, in syllable-initial position (Cole & Cooper, 1975;Baum & Blumstein, 1987).
It is worth mentioning that for a few studies (Hughes & Halle, 1956;Raphael et al., 2011), relative durations of vowel and frication are an even more relevant cue for identifying the voicing class of a fricative than the frication duration itself.
In short, the most salient temporal properties related to voicing in fricatives, and which have a greater effect on voicing than on place of articulation, seem to be the overall frication duration, the previous vowel duration, and the relative duration between vowel and fricative segments. Works by different researchers have had varying results and attributed different roles to these.
In relation to the amplitudinal cues, the overall frication amplitude is one of the differentiating properties for fricative voiced/voiceless pairs which has been studied in depth in previous studies. It has to do with the degree of muscular effort and breath involved in their articulation. The fortis /f, θ, s, ʃ/ are usually pronounced with a relatively more muscular effort and breath force than their lenis counterparts /v, ð, z, ʒ/. /h/ is usually included in the former. In general, voiced fricatives show lower overall frication amplitude than their voiceless equivalents (Ingerman, 1960;Balise & Diehl, 1995;Silbert & de Jong, 2008). This amplitudinal feature is particularly useful for differentiating between non-sibilants and sibilants, with a greater difference between non-sibilant voiceless and voiced fricatives than for the sibilant voiceless/voiced ones (Jongman et al., 2000). clac 69/2017, 33-66 barreiro: fricatives 40 Consistent with the observations of overall frication amplitude, voiceless fricatives seem to have significantly greater relative amplitude in relation to the vowel amplitude than their voiced counterparts (Ingerman, 1960). Furthermore, it is related to the fricative place of articulation, in other words, the difference in relative amplitude is significant between the voiceless and voiced labiodental and palato-alveolars, while it is of little significance for dentals and alveolars (Jongman et al., 2000). A number of other studies have concluded that this amplitudinal property is not a relevant acoustic cue for sibilants Stevens, 1985;Behrens & Blumstein, 1988;Hedrick & Ohde, 1993). As far as the effects of speaker and other context variables are concerned, overall, it can be said that relative amplitude is moderately affected by all these variables (McMurray & Jongman, 2011); even gender produces voicing distinctions in relative amplitude, with greater values for males than for females (Jongman et al., 2000).
Other amplitude-relative measures that have lately been associated to voicing in fricatives include harmonics-to-noise ratio (Maniwa et al., 2008), dynamic amplitude (Jesus & Shadle, 2002), low-frequency energy (McMurray & Jongman, 2011), and the comparison of the acoustic power in higher frequencies (or 'noise power' above 750 kHz) and lower frequencies (or 'voice power' below 750 kHz) taken at different intervals for each fricative (Silbert & de Jong, 2008). In relation to this latter measure, the authors concluded that phonologically voiceless fricatives have higher noise power than phonologically voiced fricatives, especially in those intervals near the vowel, the difference being greater in onset than in coda position. There was also a significant interaction with gender, with a greater difference between voiceless and voiced fricatives in the low-frequency power measurements of the male speaker values than those of the female ones.
Briefly, then, the most salient amplitudinal properties related to voicing in fricatives seem to be both the overall amplitude duration and the relative amplitude between vowel and fricative segments, with studies by previous researchers varying in their findings on the degree of perceptual influence.
No definite results have been shown regarding the information provided by other cues such as formant transitions, which has been considered a secondary or additional spectral cue for fricative voicing perception (McMurray & Jongman, 2011). In clac 69/2017, 33-66 barreiro: fricatives 41 particular, it seems that the transitions of the first formant (F1) are wider when the vowel is adjacent to voiced fricatives than when adjacent to the voiceless cognates, at least in intervocalic position (Stevens et al., 1992). As far as the other two transitions are concerned, it has been pointed out that there is a significantly higher F2 onset for voiced palato-alveolars than for their voiceless counterparts (Jongman et al., 2000) and no voicing difference among the other places of articulation. Furthermore, the F2 onset frequencies between voiceless and voiced fricatives are more dependent on vowel context than the F3 onset frequencies (Wilde, 1993).
Other recent studies include analyses of the spectral properties of frication noise in relation to fricative voicing, showing a mild but significant effect of this acoustic cue, with a different distribution of energy across fricative spectra between voiceless and voiced fricatives. In this respect, voiceless fricatives are characterized by higher values for spectral mean, skewness and kurtosis than voiced fricatives, and, in addition, they have significantly less variance than voiceless ones (Jongman et al., 2000;Jesus & Shadle, 2002;Silbert & de Jong, 2008;Maniwa et al., 2009). Therefore, in comparison with voiced fricatives, the spectra of voiceless fricatives have a concentration of energy towards slightly lower frequencies and slightly better defined peaks at a significantly higher frequency (Jongman et al., 2000). There are also significant interactions between voicing and place, and voicing and prosodic context. Voiced labiodental fricatives particularly show higher spectral variance (standard deviations) than their voiceless counterparts, an effect not found in coronal alveolars (Silbert & de Jong, 2008). Also, voiceless onset fricatives have higher spectral means, with greater differences in variance (standard deviations) between these two places of articulation in onset position than in coda.
Finally, results from other techniques, such as the spectral slopes of two regression lines (Jesus & Shadle, 2002) reveal that the average regression always overlaps between voiced and voiceless fricatives, supporting the idea that the inherent spectral features of the frication noise are a secondary or additional cue to voicing perception.
In sum, the analysis of prior literature (which consisted of fewer research studies than those that focussed on the study of fricative place of articulation) has shown that, in general, both amplitudinal and temporal properties can be relevant acoustic cues for clac 69/2017, 33-66 barreiro: fricatives 42 fricative voicing perception, whereas the spectral properties tend to provide secondary or additional information. In other words, the presence of phonation, the frication duration, the relative durations of vowel and fricative segments, and the frication amplitude are considered more salient acoustic cues for fricative voicing perception than the spectral properties of frication noise and formant transitions.
Bearing in mind the results of the previous paper (Barreiro, 2015) and the results from prior literature, an experiment was carried out with the following aims: To test (i) Spanish listeners' categorisation ability for an English phonemic contrast that does not occur in Spanish (/s/ vs. /z/); in Spanish the voiced consonant is an allophone that only occurs when followed by a voiced consonant.

(b)
To assess the stage of speech-perceptual development that listeners have reached, if any.
(c) To find out (i) if voicing distinctions in fricatives can be perceived by using multiple cues, and (ii) which cue(s) had the greatest perceptual weighting for that L2 population.
(d) To examine subject variables, in particular (i) inter-speaker variations in the use of perceptual acoustic cues on the identification of non-native contrasts, (ii) effects of intra-speaker variables (namely, length of L2 learning, age of L2 learning and age of the listener) on results, and (iii) language differences (Spanish vs. English) in relation to the acoustic information that is taken into account for English sibilant identification.

Method
A similar procedure was used to that described in the previous research (Barreiro, 2015 Synthetic speech was used to control what was in the signal, allowing the manipulation of the acoustic cues and the contrast of different conditions. Subsequently, it was possible to ascertain which cues had the greatest perceptual importance for a given listener, or group of listeners. The synthetic speech patterns used for the experiment were provided by SPA (Speech Pattern Audiometer). A minimal pair, Sue/zoo had been produced by computer-generated synthesis with a very high quality. This enabled us know which speech pattern elements (or acoustic cues) were being used to establish a contrast. Furthermore, it allowed us to assess the stage of speech-perceptual development that the listeners had reached (Hazan et al., 1995).
Two acoustic cues were analysed, one amplitudinal and one temporal, namely, the presence of phonation and the duration of the fricative noise, respectively. The former cue, which will be called voice bar from now on, refers to the relative amplitude (or power) between the voice bar (energy at low frequencies) and the frication noise, whereas the latter cue, frication duration, comprises a temporal property of the frication noise. Both cues are present in the fricatives themselves, not in the surrounding sounds.
The acoustic cues had been carefully and individually manipulated. A continuum of six steps was created from the minimal pair, in which one or two cues had been changed in small equal steps. The acoustic cues that were varied in order to signal the contrast were the frication duration and the voice bar.
Two different test conditions were then constructed in which cues had been varied in the following terms: (a) Contrast signalled by changes in both cues: The duration of the frication was varied in six steps of 20 ms, going from /s/ to /z/, as can be seen in Step 1 225 30 60 Step 2 205 34 58 Step 3 185 38 56 Step 4 165 42 54 Step 5 145 46 52 Step 6 125 50 50 Different parameters were changed along the continuum with the values of the frication duration starting with 225 ms in the first step (heard as /s/) and finishing with 125 ms in the sixth and last step (heard as /z/).
As far as the voice bar was concerned, there was an amplitude increase of 4 dB along the voice bar continuum, starting at 30 dB, as the amplitude of the frication was gradually decreased in steps of 2 dB, from 60 dB (see table 1).
(b) Contrast signalled by changes in the frication duration mentioned above, with no the voice bar amplitude fixed at 30 dB and the frication amplitude, at 50 dB.
Random number tables were used in order to assign a random presentation of the material from SPA system. The material was then recorded in a DAT (Digital Audio Tape), with a level of recording of around -10 dB in both channels (right and left).
A forced-choice test was designed as an identification test. Therefore, there were a fixed number of presentations (in this case, 10) of the randomised stimuli, and the listeners were instructed to use a restricted set of responses (Sue or zoo).

Listeners
26 Spanish listeners were chosen as the participants of this experiment, the same group as in the previous study (Barreiro, 2015). They all used Spanish (Castilian) as their L1 had ten blocks of twelve Sue-zoo pairs on their response sheet and had to underline the identified word of each pair. Each block was presented with an interval of 8 seconds.

Test Analysis
The analysis of the data from the two acoustic cues under investigation (frication duration and voice bar) was carried out with the program BASIC, an identification test scoring program. The outcome was the number of Sue responses at each step.
The assessment of the mean identification functions (or labelling curves) was based on the overall shape of the curve, the values obtained at the extremes of the range, the phoneme boundaries (defined as the point on the continuum where each category is heard equally often), and the gradient or slope of the curve (Hazan & Fourcin, 1985;Hazan et al. 1995). The labelling curves plotted the average percentage of /s/ responses across the range of stimuli in each of the two test conditions.
Finally, the MLE (Maximum Likelihood Estimate) was used to obtain a quantitative measure of phoneme boundaries and gradients.

Results
Both the data of each member of the group of Spanish listeners (from BASIC) and the mean values of the gradients for the English and the Spanish groups (from MLE) for each of the test conditions can be seen in the appendix section (appendix 1 and 2, respectively). The mean identification function obtained could be described as progressive, following Hazan & Fourcin's terminology (1985: 328).

Spanish Results
The Spanish listeners as a group identified the first two steps of the continuum as /s/ almost 90% of the time, and steps 5 and 6 were reported to be /z/ above 90%. There was a gradual perceptual shift from to voiceless to voiced sibilant between steps 3 and 4. In Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 % /s/ responses In the first place, the labelling functions of 14 subjects showed, in Hazan & Fourcin's words, "confident responses at the extremes of the range and consistent categorization of intermediate stimuli" (Hazan & Fourcin, 1985: 328), what is known as categorical identification functions: listeners have an ability to clearly divide the stimuli continuum into two categories. Furthermore, 5 subjects obtained mean identification functions that could be described as progressive, leaving the other 7 with random labelling curves.
Random comprises many different types of labelling shapes, including those with a flat response around 100% /s/ identification, those completely random at the level of the phoneme boundary, and those with values scattered across the continuum.
In the second place, variability also affected the phoneme boundary, whose range of variation was between 2.226 and 4.302. For the majority of the Spanish listeners with progressive or categorical labelling functions (13 subjects), the perceptual shift from the voiceless to the voiced fricative occurred between steps 3 and 4. It was also found that the phoneme boundary had shifted to the left (between the step 2 and 3) for 5 subjects, and was situated further to the right (between the step 4 and 5) for 1 other subject.
Correlation analyses were carried out across the 26 speakers included in the experiment, relating the intra-speaker variables (i.e., length of L2 learning, age of L2 learning and Finally, it was found that the number of categorical identification functions was greater within the group of listeners with an advanced level of English than within those with an intermediate level (57% vs. 43%), although it was not statistically significant (r = 0.326, p > .05).

Frication-duration cue Condition
When the voice bar was removed and the contrast was only cued by the duration of the frication, the Spanish listeners were unable to perceive voicing. The mean labelling functions obtained from the averaged Spanish listeners could be described as random.
Furthermore, 7 listeners did not perceive any voicing at all, identifying the stimuli as /s/ over 90% of the time along the continuum.
Regarding inter-speaker differences, there was a much lower level of variability among the listeners' scores (figure 3) compared to that observed in the previous test condition.
In fact, all 26 subjects showed random identification functions. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 % /s/ responses As in the previous test condition, no significant correlations were seen between the intra-speaker variables and the type of labelling curve identified (r = 0.146, r = 0.029, r = 0.376 at p > .05, respectively). Furthermore, there were more listeners who showed Finally, no significant correlation could be found between any of the intra-speaker variables and the results obtained in the present study.

Comparison with English Results
The mean labelling functions obtained from the data for the English group are shown in  were not reached in the Spanish labelling curve was due to the individual variability among the listeners' results. It was also found that both groups perceived a shift from a voiced alveolar sibilant to its voiceless counterpart in the same area, that is, between steps 3 and 4.
In the second test condition, when there was no voice bar and the contrast was only cued by frication duration, despite the great variability among listeners, the random mean labelling curves revealed that neither the English group nor the Spanish group could make a distinction based only on this cue. It can therefore be concluded that the information provided by this acoustic feature of the speech signal is not enough to perceive a voicing distinction between alveolar sibilants, regardless of the listener's mother tongue. However, the Spanish labelling curve showed a flatter response, with a progressively higher percentage of /s/ responses from step 2 onwards, showing that this cue provided even less information for the Spanish group than for the English group. It was also found that the labelling of endpoints was better in the English group: 60% /z/ response at shortest duration by 20% in the Spanish group. clac 69/2017, 33-66 barreiro: fricatives 52

Discussion
The general aim of this study was to achieve a better understanding of Spanish perception of the voicing of the English contrast /s/-/z/, which is not phonemic in the Spanish language, through the comparison with native English speakers.
Results of the experiments suggest that regarding the Spanish listeners' categorization ability in a non-native contrast, there was a clear indication that the ability to distinguish the sibilant /z/ from /s/ had been already acquired by the listeners. The mean labelling functions for the averaged Spanish listeners when both cues were present revealed that the group was able to confidently label the extremes of the range (above 90% or more) (1) Most listeners, irrespective of their L1, were able to establish the voicing contrast when both cues were present in the signal.
(2) Both groups of listeners were almost completely reliant on the voice bar (phonation) cue in establishing the voicing contrast. Completely different results were obtained when the contrast was only cued by the duration of the frication noise. This temporal cue in isolation did not provide sufficient perceptual information to distinguish between the two sibilant consonants in either of the groups of listeners, native and non-native. This result supports the idea that when the overall frication duration is shortened there is a perceptual shift but it is not pronounced and significant for the sibilants (Harris, 1958;Jongman et al., 2000).
(3) The information provided by the overall frication duration cue was not significant for either group, although it was a more salient cue for the English listeners than for the Spanish. This slightly different language-specific pattern of taking voice bar into account for sibilant identification means that the English listeners were able to rely on the information of other acoustic cues for a voicing contrast between phonemes with the same place of articulation, namely, the voiceless alveolar /s/ and the voiceless alveolar /z/.
These findings could also indicate that the Spanish group might have made a different use of an acoustic cue depending on the language. For instance, in Spanish, the voice bar has not been considered a robust cue when making voicing distinctions in stops (Martínez Celdrán, 1984). Nevertheless, these listeners used it in the L2 fricatives, at least in initial position. It is interesting to mention here that, in the same line of research, some previous studies have claimed that bilinguals seem to have a separate coding system, one for each language, and they shift from one to another depending on the language in use (Macnamara et al, 1968).
In relation to the effects of the intra-speaker variables (namely, length of L2 learning, age of L2 learning and age of the listener) that may have influenced the perception of non-native contrasts, our results should make us sceptical about the importance of such variables. No correlation was found between these factors and the Spanish listeners' stage of speech-perceptual development. As suggested in the previous study (Barreiro, 2015), some listeners who considered themselves as having an intermediate English clac 69/2017, 33-66 barreiro: fricatives 54 level and had never been abroad to an English-speaking country (or only for short periods of two or three months), seemed to have better perception than others who had studied longer, had been abroad for longer periods of time, or started to study English earlier.

Final conclusions
The arguments presented in this paper clarify and extend the analysis of a previous study (Barreiro, 2015) by shedding more light on the issue of Spanish perception of non-native contrasts, in particular, English sibilants. Therefore, our final conclusions will include references to both pieces of research.
Based on our preceding work and theoretical models, certain predictions were possible about Spanish listeners' perceptual behaviour when they are exposed to non-native contrasts with a different status in the L1 (/s/ and /ʃ/, on one hand, and /s/ and /z/, on the other). The voiced alveolar sibilant is an allophonic realisation of /s/ in Spanish, only occurring when followed by a voiced consonant (mainly plosive), whereas the palatealveolar sound /ʃ/ does not exist in the Spanish language.
The fact that both languages (Spanish and English) have a different phonological system meant that our Spanish listeners might encounter certain perceptual problems.
Language-specific perceptual differences were to be expected, due to the nature of the L1, the relationship between L1 and L2, as well other subject factors such as year of experience with L2, age of subject and age of learning.
The initial hypothesis in relation to the Spanish listener's categorisation ability was that there would be a better perception of the place of articulation contrast (/s/ and /ʃ/) than the voicing contrast (/s/ and /z/), as /ʃ/ would have been acquired as new category by native Spanish speakers of English (according to Flege's Speech Learning Model, 1995), or, even if a new category had not been created, listeners would generally do well discriminating sounds that had been assimilated into two different English categories (following Best & Strange's Perceptual Assimilation Model (PAM), 1992, or the more recently developed Best & Tyler, PAM-L2, 2007). Our results seemed to confirm the research hypothesis: the Spanish listeners were able to label the palatealveolar sibilant /ʃ/ more confidently than the voiced sibilant /s/ due to the different clac 69/2017, 33-66 barreiro: fricatives 55 status of those sounds in their L1 (non-existence vs. allophone). The lack of experience in L1 to voicing contrasts in fricatives, as between /s/ and /z/, implied a greater perceptual difficulty when differentiating L2 phonemic contrasts based on this difference (Lasky et. al., 1975;Werker et al., 1985;Mann, 1992;Polka 1995;Flege, 1998;Best et al., 2001;Johnson, 2004;Kabak & Maniwa, 2007;Boomershine et al., 2008, to mention a few).
As a group, the Spanish listeners were in a progressive stage of speech-perceptual development for both contrasts. Nevertheless, the careful analysis of the evident interspeaker variation in the use of perceptual acoustic cues showed that most Spanish listeners seemed to have reached a further stage in the /s/ vs. /ʃ/ than in the /s/ and /z/ contrast; the number of listeners who obtained categorical identification functions was higher for the place of articulation contrast than for the voicing contrast, 19 vs. 14.
Also, there was a smaller number of random curves in the former contrast, 7 vs. 2, a finding that also supports the initial research hypothesis. However, discrimination tests would have enabled us to make a more valid statement about listeners' perception by establishing whether the group of listeners had developed the ability to perceive categorically.
The results of both experiments not only indicated which cues had the greatest perceptual importance in non-native contrasts for the Spanish group but also allowed us to highlight language similarities between the Spanish and English groups: (1) Most listeners, regardless of their first language, perceive a clear shift from one category to the other in both English phonemic contrasts, although the English listeners showed a slightly higher degree of confidence in the identification of phonemes.
(2) There was a great inter-speaker variability among the listeners' results for both languages not only in the stage of speech-perceptual development but also in the phoneme boundaries which separated the categories of each phonemic contrast. (3) The location of the phoneme boundaries along test continua did not seem to be language-specific in either contrast. The slight variations in location across the experimental conditions were not statistically significant in either of the two languages.
(4) Most listeners, regardless of their first language, were able to establish both English contrasts when both cues were present in the signal.
clac 69/2017, 33-66 barreiro: fricatives 56 All the listeners had very similar results when the place of articulation contrast was only cued by the frication spectral characteristics. However, F2 transitions did not yield positive results. This cue in isolation did not provide enough information to distinguish between the two consonants for either of the groups, especially in the case of the Spanish group. It appeared to be an additional or secondary cue.
The primary acoustic cue for both groups in making voicing distinctions between alveolar sibilants was the presence of phonation (or voice bar) for both groups. The information coming from the temporal property of the frication noise was an additional or secondary cue at that syllable position. Since it is not significant in initial position, as had been observed in previous literature (Cole & Cooper, 1975;Baum & Blumstein, 1987), it would be worth carrying out a further analysis to discover whether frication duration in VC position has more perceptual weighting than in CV when identifying voicing of alveolar sibilants, especially with non-native sounds. In addition, the role of other additional cues, such as the duration of the adjacent vowel could also be analysed, following previous works (Denes, 1955;Harris, 1958;Raphael, 1972;Cole & Cooper, 1975;Soli, 1981;Baum & Blumstein, 1987;Crystal & House, 1988a& 1988b. Glicksman & Stevens, 1988Jongman, 1989;Gordon, 1989;Raphael et al. 2011;McMurray et al. 2011).
These findings might also reveal certain cross-linguistic differences. It was found that the way L2 learners seemed to weight the information available in the signal was different from the way L1 speakers did this when making a linguistic decision. In particular, the Spanish group could have made a different use of an acoustic cue depending on the contrast and language. For instance, in Spanish, although the voice bar is not a strong cue when making voicing distinctions in stops, these listeners seemed to use it in the L2 fricatives. Furthermore, Spanish speakers did not use F2 transitions as cues for the perception of place of articulation contrasts of English, while these acoustic features seem to play a relevant role in identifying the place of articulation of Spanish consonants, as claimed by Moreno Llaneza (1990). It would also be interesting to investigate the effects of F3 transitions, a cue which was not manipulated in this study.
Finally, the analysis of the self-assessment showed no correlation between subject variables (namely, length of L2 learning, age of L2 learning or age of the listener) and clac 69/2017, 33-66 barreiro: fricatives 57 their stage of speech-perceptual development in any of two English contrasts. Some listeners seemed to have better perception than others who had studied longer, being abroad for longer periods of time and started studying English earlier. As mentioned in the previous paper (Barreiro, 2015), it may be that the Spanish listeners' immersion in the L2 was not sufficient to cause significant differences, as they all lived in Spain, and the contact with the L2 was limited to relatively short periods abroad. Moreover, they all began learning English after the age of 5, i.e., they were monolingual, and described themselves as such.
As suggested in the first part of this two-fold investigation (Barreiro, 2015), it would be interesting to analyse a more heterogeneous group with objective differences in their level of English (as opposed to the self-assessment criteria used here) in order to reveal Step 1 Step 2 Step 3 Step 4 Step 5 Step 6  Step 1 Step 2 Step 3 Step 4 Step 5 Step 6  Step 1 Step 2 Step 3 Step 4 Step 5 Step 6