Contribution of free satellite time-series images to mapping plant communities in the Mediterranean Natura 2000 site: the example of Biguglia Pond in Corse (France)

Mapping plant communities, which is essential to assess the conservation status of natural habitats, is currently based mainly on time-consuming field surveys without the use of satellite data. However, free image time-series with high spatial and temporal resolution have been available since 2015. This study assessed the contribution of Sentinel-2 time-series images to mapping the spatial distribution of 18 plant communities within a Natura 2000 site (1978 ha) located on the Mediterranean biogeographical region (Corsica, France). The method was based on random forest modeling of six Sentinel-2 images acquired from 26 February to 24 October 2017, which were calibrated and validated using a field vegetation map. The results showed that the 18 plant communities were modeled correctly, with 72% overall accuracy. The uncertainty map associated with the model indicated areas that required additional field observations.


Introduction
Mapping plant communities is a major challenge when monitoring biodiversity, especially in the framework of the European Union (EU) Habitats Directive (Attorre et al., 2018). Since plant communities indicate the conservation status of natural habitats (Rodwell et al., 2018), mapping and monitoring them is essential to assess impacts of management practices on plant biodiversity (Silva et al., 2019). Maps of plant communities are usually produced from time-consuming field observations supplemented by visual analysis of aerial photographs (Díez-Garretas et al., 2019). These maps are flawed, however, because they (i) do not cover all landscapes but rather tend to focus on heritage sites (Campagnaro et al., 2019), (ii) are outdated because they are not updated frequently (EEA, 2015), and (iii) include delimitation or characterization bias related to the map producer's interpretations (Ullerud et al., 2018).
Recent remote sensing data with very high spectral, spatial, or temporal resolution offer a promising alternative for mapping plant communities. Hyperspectral and superspectral (i.e. more than 10 bands) data have been commonly used to accurately map plant communities (Wang & Gamon, 2019). For example, 19 grassland plant communities in eastern Europe were spatially discriminated with 81% accuracy using an airborne image acquired in the visible and near-infrared region, with 128 spectral bands at 5 m spatial resolution (Burai et al., 2015). Fifteen salt grassland plant communities on the Atlantic coast were modeled with 95% accuracy from a WorldView-3 satellite data, with 16 spectral bands and a spatial resolution of 0.3 m (Collin et al., 2018). Unmanned aerial vehicles have also been used to acquire VEGETATION DIVERSITY AND GLOBAL CHANGE multispectral or hyperspectral data that successfully discriminated plant communities (Aasen et al., 2018). For example, nine plant communities from grasslands and moors in Portugal were mapped with 89% accuracy from RGB drone data at a resolution of 0.1 m (Gonçalves et al., 2016). In addition, several studies have shown that annual satellite time-series can be used to model plant communities accurately. For example, seven wet grassland plant communities in Germany were modeled with 85% accuracy using an annual time-series of either 13 TerraSAR-X SAR satellite data at a spatial resolution of three m or 6 RapidEye optical satellite data at a spatial resolution of 6 m (Schuster et al., 2015). However, the use of these remote sensing data by botanists raises two related issues: (i) the high cost of acquiring the data, especially using airborne platforms (Turner et al., 2015) and (ii) the high degree of expertise required to analyze the data -especially hyperspectral, SAR, and drone data-and to pre-process them before integrating them in a GIS (Borre et al., 2011).
Among the remote sensing data available, Sentinel-2 satellite time-series are free of charge (e.g. from the Copernicus platform of the European Space Agency) and in a ready-to-use GIS format (i.e. geometrically and atmospherically corrected) (Hagolle et al., 2015). Sentinel-2 sensors provide 13 spectral bands in the visible, near-, and mid-infrared spectra, with a spatial resolution of up to 10 m. Although these characteristics are useful for monitoring the phenology of natural vegetation (Vrieling et al., 2018) or characterizing vegetation functional traits (Schauman et al., 2018), the contribution of Sentinel-2 data to mapping plant communities remains poorly explored. However, a recent study revealed that seven wet grassland plant communities derived from unsupervised classification of vegetation plots were modeled with 78% accuracy using an annual Sentinel-2 time-series (Rapinel et al., 2019). These initial results must be confirmed with further studies in other regions with other types of plant communities defined within the Prodromus framework (Reymann et al., 2017).
The objective of this study was to evaluate the contribution of Sentinel-2 time-series images to mapping grassland, shrubland, tree communities and related directive habitats, in a Natura 2000 site in the Mediterranean biogeographical region.

Study site
The Biguglia pond is a site of Community Importance (FR9410101 SCI) and covers 1978 ha along the eastern coast of Corsica, France ( Figure 1). The topography is flat with elevation ranging between 0 and 3 meters above sea level. The site has a Mediterranean climate (mean annual precipitation 835 mm, mean annual temperature 16°C), with summer droughts from June to September (Delbosc, 2015). The site consists of a coastal lagoon enclosed by salt marshes, sand dunes, grasslands, thickets, and wooded groves. It is designated a Natura 2000 site because it contains several habitats of EU community importance, such as 1150 "Coastal lagoons", 1310 "Salicornia and other annuals colonizing mud and sand", 1410 "Mediterranean salt meadows (Juncetalia maritimi)", 2260 "Cisto-Lavanduletalia dune sclerophyllous scrubs", 6420 "Mediterranean tall humid grasslands of the Molinio-Holoschoenion", and 92D0 "Southern riparian galleries and thickets (Nerio-Tamaricetea and Securinegion tinctoriae)". A management plan has been developed to conciliate conservation, agricultural practices (grazing, mowing), and recreation (educational activities). A field vegetation map was produced at 1:2500 scale ( Figure 1) for reporting purposes for the Habitats Directive (Delbosc et al., 2020). To this end, 118 vegetation plots were sampled from May to June 2014 according to the Braun-Blanquet method (Delbosc, 2015). Species taxonomy and nomenclature followed the "Flora Corsica" (Jeanmonod & Gamisans, 2013). Each vegetation plot was assigned to a plant community, following the Corsican Vegetation Prodromus (Reymann et al., 2017). The extent of each plant community was mapped by interpreting aerial photographs acquired in 2013. Table 1 lists the plant communities identified and represented in the vegetation map with a spatial extent sufficiently large to be detected in satellite data with a spatial resolution of 10 m, along with their corresponding habitat code (Annex 1, Habitats Directive).

Spectral analyses
Sentinel-2 satellite data were selected from the Theia catalogue (theia.land.fr) among cloudless images from 2017 ( Figure 1). Images were already orthorectified and atmospherically corrected using the MACCS-ATCOR Joint Algorithm (MAJA) (Hagolle et al., 2015). Six images acquired from 26 February to 24 October 2017 were retained ( Figure 2). For each image we used the 4 bands at 10 m spatial resolution acquired in the blue (b2: 497 nm), green (b3: 560 nm), red (b4: 664 nm), and near-infrared (b8: 835 nm) spectra, as well as the 6 bands at 20 m spatial resolution acquired in the red-edge (b5: 704 nm, b6: 740 nm, b7: 782 nm), near-infrared (b8a: 865 nm), and short-wave infrared (b11: 1614 nm, b12: 2202 nm) spectra. The three atmospheric bands (b1, b9 and b10) were discarded. The six spectral bands at 20 m spatial resolution were resampled to 10 m. As a result, the satellite data used as input for modeling plant communities included 60 variables (10 spectral bands × six dates) at 10 m spatial resolution. The pairwise spectral distance between plant communities was calculated using the Jeffries-Matusita (JM) distance. The JM distance is commonly used in remote sensing science to measure the average distance between two classes (Richards, 2013), where the lowest value (0) indicates identical classes that are impossible to separate and the highest value (1.41) reflects two classes that are perfectly separable. A dimension reduction was performed using a principal component analysis to reduce the collinearity between the 60 spectral variables (10 bands × 6 dates). The first five components of the PCA (90.2% of the variance) were used to calculate the JM distance.
Plant communities were mapped using random forest (RF) modeling (Breiman, 2001). The RF model is widely used in remote sensing science because of its ability to manage a large number of variables, its low sensitivity to mislabeling errors, and its ability to map model uncertainty (Belgiu & Drăguţ, 2016). Reference plots were selected from the vector field vegetation map at 1:2500 scale to calibrate and validate the RF model. To this end, an expert botanist manually identified the Sentinel-2 pixels in GIS by overlapping a multi-temporal near-infrared Sentinel-2 color composite (26 February, 17 May, and 24 October) and Google Earth images with the contours of the vegetation map. The botanist selected or excluded pixels from the sample based on uncertainties in (i) the spatial extent of polygons in the vegetation map, (ii) changes in the vegetation between the date of the field map and the year of Sentinel-2 image acquisition, and (iii) the spatial intermingling of plant communities not specified within a polygon in the vegetation map.
To increase the independence of the observations, sub-sampling was performed at a minimum distance of 20 m between two reference plots of the same plant community. A total of 1224 reference plots were collected for the 18 plant communities. The model was calibrated and validated using a 10-repeated 5-fold approach (Kuhn & Johnson, 2013). To assess model accuracy, we calculated the overall accuracy index, the Kappa index, and the F1 score per plant community (Congalton & Green, 2008). The F1 score is an accuracy metric specific to each class (i.e. plant community) that reflects a weighted average between precision (user's accuracy) and recall (producer's accuracy) calculated as follows: Analyses were performed using R software (R Core Team, 2015) with the raster (Hijmans, 2015), rgdal (Bivand et al., 2015), and caret (Kuhn, 2008) packages.

Results
The pairwise spectral separability between plant communities based on the JM distance is shown in Table  2. Overall, plant communities are spectrally well separated (median JM distance value = 1.37). Consistently, the spectral separability decreases with the increase of the similarity in floristic composition. For example, the spectral response of salt marshes plant community 2 (Suaedo maritimae-Salicornietum patulae) is very distinct (JM distance = 1.41) from that of grasslands plant community 5 (Trifolio fragiferi-Cynodontion dactylonis) but closer (JD distance = 0.90) to that of salt marshes plant community 3 (Arthrocnemo glauci-Salicornietum emerici). Nevertheless, this relationship between spectral separability and floristic composition is not systematically confirmed: for example, spectral separability between salt marshes and reed beds plant communities is moderate (JM distances 0.89−1.25), possibly due to the frequent occurrence of open water. According to the confusion matrix between the 18 plant communities modeled from the Sentinel-2 time series and the reference plots (Table 3), modeling accuracy was satisfactory (overall accuracy 72%, Kappa index 0.70), but differed significantly among classes:  Comparing maps of the plant community and RF model certainty to the very high spatial resolution Google Earth image (Figure 3, top) showed that the model reflected the fine-grained pattern of plant communities correctly, especially for grassland areas. In addition, scrub or forest plant communities were well identified, including those structured as hedges. Analysis of the certainty map indicated large disparities between areas of high certainty (≥ 0.8) in the middle of large patches of forest or grassland plant communities, and areas of low certainty (≤ 0.3) that corresponded in particular to transitions or a continuum between plant communities.

Discussion
Plant communities in Natura 2000 sites located within the Mediterranean biogeographical region, which includes a wide diversity of ecological units, can be accurately modeled (overall accuracy 72%) using a random forest classification model and Sentinel-2 timeseries. This overall accuracy was slightly lower than that obtained by Rapinel et al. (2019) for semi-natural Atlantic grasslands (78%), but the modeling process was more challenging due to the larger number of plant communities (18 vs. 7, respectively) and consequently more intermingled vegetation types and complex patterns. Although the high accuracy of modeling the amphibious Ruppietum cirrhosae plant community (F1 score 99) was not remarkable (since vegetation in coastal lagoons has a more distinct spectral signature than that in terrestrial plant communities) (MacKay et al., 2009), some plant communities in the same ecological unit were discriminated well. For example, the Trifolio fragiferi-Cynodontion dactylonis and Lino biennis-Festucetum arundinaceae communities were modeled correctly (F1 score 87 and 81, respectively), even though both contained common species such as Cynodon dactylon and Ranunculus sardous and were associated with the same habitat (6420 "Mediterranean tall humid grasslands of the Molinio-Holoschoenion"). The strong discrimination between these 2 plant communities can be explained by phenological differences observed among the 6 satellite data acquired in all 4 seasons of the same year ( Figure  2). Recent studies indicate the potential of Sentinel-2 time series to discriminate fine-scale vegetation patterns based on phenological profiles (Vrieling et al., 2018). Conversely, the model confused plant communities patterned by a common and covering species that strongly influenced the spectral signature, such as for the 2 forest plant communities (Inulo crithmoidis-Tamaricetum africanae and Althaeo officinalis-Tamaricetum africanae), which were physiognomically and structurally patterned by Tamarix africana, and the 4 reed bed plant communities (Inulo crithmoidis-Phragmitetum australis, Kosteletzkyo pentacarpos-Phragmitetum australis, Phragmitetum australis, and Phragmitetum australis calystegietosum sepii) with high coverage of Phragmites australis. Remote sensing data with higher spatial and spectral resolution are required to discriminate these plant communities more accurately (Corbane et al., 2015).
This study aimed to evaluate the contribution of an annual Sentinel-2 satellite time series to model 18 plant communities. However, some plant communities could not be properly discriminated based only on the use of spectral variables. The accuracy of the model could be increased by combining several types of variables (Fois et al., 2018) providing information on vegetation structure, micro-topography, micro-climate or soil characteristics. In this case, the potential envelope of each plant community could first be modelled based on environmental variables and then the realized plant community area could be predicted based on spectral variables derived from remote sensing data. As an example, this methodology was successfully applied by Álvarez-Martínez et al. (2018) to model 24 Natura 2000 habitats on a 120,000-ha site by combining variables derived from remote sensing data (1 Landsat 8 OLI image + 1 canopy height model derived from LiDAR data) with topographic and climatic variables.
Although botanists do not yet use remote sensing data widely (Borre et al., 2017), this study illustrates that the data and method used to model plant communities and related habitats do not present any major technical challenges. First, the pre-processed Sentinel-2 satellite data (Level 2A) can be downloaded free of charge from the Theia (theia.land.fr) or Copernicus hub (https:// scihub.copernicus.eu) websites in a format that is readyto-use in GIS. Second, satellite data can be analyzed using R software, which is commonly used in vegetation science (Dixon & Palmer, 2003;Zelený & Tichý, 2009). This method models plant communities with < 75% accuracy, which is insufficient to be considered a baseline vegetation map. However, combining it with the RF model uncertainty map provides a valuable tool for applying preferential sampling (De Cáceres et al., 2015) to new areas to prospect during future field missions (Zlinszky et al., 2014) and thus improves vegetation typology and mapping accuracy (Fanelli et al., 2005).
This mapping approach for plant communities has several advantages compared to approaches based on expert decision rules of vegetation physiognomy and environmental criteria : (i) it highlights floristic variations within a habitat (Feilhauer et al., 2014); (ii) allows the conservation status of a habitat to be assessed (Angelini et al., 2018); and (iii) avoids one-to-many correspondences between a vegetation physiognomy and habitats . The disadvantage of this approach, however, is the need for reference data at the syntaxonomic level, which unfortunately remains rare (Campagnaro et al., 2019).
The quality of the reference data is an important issue when modeling with satellite data (Maxwell et al., 2018). Ideally, the reference plots are selected from phytosociological records collected in the same year as the satellite data acquisitions (Rapinel et al., 2019). This was not the case in the current study because the archive vegetation plots (2 × 2 m) were smaller than the resolution of the Sentinel-2 satellite data (10 m). As a default, reference points were collected within the polygons of the field vegetation map, and although we adopted precautions, it inevitably resulted in biases related to (i) the 3-year difference between producing the field map (2014) and acquiring the Sentinel-2 data (2017) and (ii) subjective delineation of plant community patches by the field map producer (Ullerud et al., 2018). In this study, the reference plots were manually selected, which was time consuming. While "pure" pixels can be identified and selected automatically, the detection of outliers due to geometric errors made in the field vegetation map remains challenging. Nevertheless, recent studies have shown that outliers could be detected based on their spectral characteristics (Raab et al., 2018;Halladin-Dąbrowska et al., 2020), which would significantly reduce the efforts required to select reference pixels. This suggests that the limitation for modeling plant communities is no longer access to satellite data but rather access to field reference data (Borre et al., 2017). This highlights the crucial importance for botanists to georeference vegetation plots accurately and, above all, to structure and disseminate them in databases such as the European Vegetation Archive (Chytrý et al., 2016) or VegFrance (Bonis and Bouzillé, 2012) so that they can be used to calibrate and validate models based on satellite data.
Although the results of modeling plant communities revealed heterogeneous patterns (Figure 3), the spatial resolution of Sentinel-2 satellite data (10 m) obviously remains too coarse to detect small vegetation patches (Rapinel et al., 2019). Consequently, we modeled only 18 plant communities. Thus, many plant communities that occur only occasionally but may nonetheless characterize habitats of community importance were excluded. Using satellite data with a higher spatial resolution could resolve this limitation. SPOT-6 images, which are available at low cost and with a spatial resolution of 1.5 m, offer an interesting alternative to Sentinel-2 images for sites where mapping fine-grained vegetation patterns is critical.
The map of plant communities derived from Sentinel-2 images can be used as a decision support tool for the conservation status assessment of natural habitats. For example, the presence of the plant community Arthrocnemo glauci-Salicornietum emerici -typical of Mediterranean salt meadows-in the Natura 2000 site of Biguglia is remarkable given its large extent (61 ha) and indicates a good conservation status of the habitat HD 1310 (Salicornia and other annuals colonising mud and sand) (Bensettiti et al., 2004). In addition, the uncertainty map provides information that can be used for implementing different management plans, since it highlights transition areas -or continua-between plant communities, which are rarely provided by field vegetation maps (Schmidtlein et al., 2007). These continua can be considered as biodiversity hot spots that have a high probability of change (Álvarez-Martínez et al., 2011). In the case of the Biguglia site that has a very low and flat micro topography, the continua are mainly structured by variations in soil moisture and salinity in relation to climatic hazards (i.e. fluvial inflow during precipitation and seawater inflow during storms) (Delbosc et al., 2020). Thus, these continua -highlighted by Sentinel-2 images-reflect the effective functioning and self-regulating of the lagoon ecosystem.
Remote sensing data are a tool with standardized and spatialized measurements over long time scales suitable for monitoring changes in biodiversity (Schrodt et al., 2019), such as the Habitats Directive reporting. In our study, we mapped plant communities using one-year satellite time-series. In the perspective of longer-term, interannual monitoring, the major factors in the dynamics of natural vegetation that could be easily considered in the site of Biguglia due to the very high accuracy of annual maps derived from remote sensing data such Sentinel-2 images are fire hazard (García-Llamas et al., 2019 and land use and cover changes (e.g. Álvarez-Martínez et al., 2010;Mallinis et al., 2011;Kallimanis et al., 2015;Amici et al., 2017). Conversely, changes in plant communities, for example in response to a decrease in grazing pressure, remain challenging to quantify not only because of the lower accuracy of annual maps (e.g. 72% in this study)the changes observed being related both to real vegetation dynamics and to cumulative modelling errors (Álvarez-Martínez et al., 2011)-but also because of the limited access to georeferenced historical field vegetation data that are needed to calibrate and validate the remote sensingbased models (Anderson, 2018). To overcome these methodological issues, several solutions can be considered, such as: i) the "time-first, space-latter" modelling approach (i.e. modelling of inter-annual time profiles rather than independent modelling by year) (Picoli et al., 2018); ii) the consideration of uncertainty in the change detection (Álvarez-Martínez et al., 2010); iii) the exploitation of vegetation archives (e.g. Chytrý et al., 2016).

Conclusions
This study demonstrates that plant communities in Natura 2000 sites located within the Mediterranean biogeographical region, which includes a wide diversity of ecological units, were modeled accurately using a random forest classification model and Sentinel-2 satellite data time series. The approach, based on the use of free, ready-to-use images processed using R software, can be used by botanists. Although the plant community map derived from satellite data is not sufficiently accurate to be considered a reference document, combining it with a model certainty map is a useful tool for targeting areas that require additional field sampling. In perspective, SPOT-6 satellite time series analysis with very high spatial resolution is a promising approach for mapping fine-scale vegetation patterns.