-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
340 lines (229 loc) · 19.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
<<<<<<< Updated upstream
# Distribution and protection of commonly observed earthworm species across Europe based on SDMs
=======
# Soil Biodiversity across Europe based on SDMs
>>>>>>> Stashed changes
The following code used occurrence data and environmental variables to predict Species Distribution Models (SDMs) for earthworms (Crassiclitellata) across Europe.
## Description
Belowground biodiversity does not necessarily follow aboveground biodiversity patterns, but maps of soil biodiversity remain scarce because of limited data availability. Earthworms belong to the most thoroughly studied soil organisms, and - in their role as ecosystem engineers - have a significant impact on ecosystem functioning. Here, we address the questions where and what is needed to protect European earthworm species. We focus on highly recorded species only, that is on species with broad geographic ranges and therefore an appropriate number of occurrence records in Europe (min. 10 records). First, we map the spatial distribution of common earthworm species and the respective total species richness. We use multiple environmental predictors spanning climatic, edaphic, and topological factors. In accordance with the literature, we predicted potential species distributions by well-performing models (i.e., MaxEnt and Ensemble modeling). Second, we investigate how well the different types of protected areas cover mean and individual species ranges by estimating the respective proportion of range area. Based on the protection coverage, we are able to identify vulnerable species, that is species whose ranges only sparsely overlap with current protected areas.
## Getting Started
### Dependencies
* Windows 10
* R version 4.1.2
* RStudio version 1.2.5033
* sessionInfo()
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] exactextractr_0.8.2 sf_1.0-7 rgdal_1.5-31 colorRamps_2.3.1 gridExtra_2.3
[6] ggpubr_0.4.0 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 rJava_1.0-6
[11] dismo_1.3-5 biomod2_3.5-1 CoordinateCleaner_2.0-20 rgbif_3.7.2 caret_6.0-92
[16] lattice_0.20-45 usdm_1.1-18 here_1.0.1 raster_3.5-15 sp_1.4-7
[21] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[26] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1 terra_1.5-21
loaded via a namespace (and not attached):
[1] spam_2.8-0 readxl_1.4.0 uuid_1.1-0 backports_1.4.1 plyr_1.8.7 lazyeval_0.2.2
[7] splines_4.2.0 listenv_0.8.0 TH.data_1.1-1 digest_0.6.29 viridis_0.6.2 earth_5.3.1
[13] fansi_1.0.3 magrittr_2.0.3 tzdb_0.3.0 recipes_0.2.0 globals_0.14.0 modelr_0.1.8
[19] gower_1.0.0 matrixStats_0.62.0 sandwich_3.0-1 hardhat_0.2.0 colorspace_2.0-3 rvest_1.0.2
[25] haven_2.5.0 crayon_1.5.1 jsonlite_1.8.0 zoo_1.8-10 survival_3.3-1 glue_1.6.2
[31] PresenceAbsence_1.1.10 gtable_0.3.0 ipred_0.9-12 emmeans_1.8.1-1 car_3.0-13 future.apply_1.9.0
[37] biscale_1.0.0 maps_3.4.0 abind_1.4-5 scales_1.2.0 oai_0.3.2 mvtnorm_1.1-3
[43] DBI_1.1.2 rstatix_0.7.0 Rcpp_1.0.8.3 plotrix_3.8-2 viridisLite_0.4.0 xtable_1.8-4
[49] units_0.8-0 rworldmap_1.3-6 foreign_0.8-82 proxy_0.4-26 dotCall64_1.0-1 Formula_1.2-4
[55] stats4_4.2.0 lava_1.6.10 prodlim_2019.11.13 httr_1.4.3 geosphere_1.5-14 ellipsis_0.3.2
[61] pkgconfig_2.0.3 reshape_0.8.9 nnet_7.3-17 dbplyr_2.1.1 utf8_1.2.2 conditionz_0.1.0
[67] tidyselect_1.1.2 rlang_1.0.2 reshape2_1.4.4 munsell_0.5.0 TeachingDemos_2.12 cellranger_1.1.0
[73] tools_4.2.0 cli_3.3.0 generics_0.1.2 broom_0.8.0 maxnet_0.1.4 ModelMetrics_1.2.2.2
[79] fs_1.5.2 precrec_0.12.9 randomForest_4.7-1 future_1.25.0 nlme_3.1-157 whisker_0.4
[85] xml2_1.3.3 compiler_4.2.0 rstudioapi_0.13 ggsignif_0.6.3 e1071_1.7-9 reprex_2.0.1
[91] stringi_1.7.6 plotmo_3.6.1 fields_13.3 rgeos_0.5-9 Matrix_1.4-1 classInt_0.4-3
[97] gbm_2.1.8 vctrs_0.4.1 pillar_1.7.0 lifecycle_1.0.1 estimability_1.4.1 cowplot_1.1.1
[103] maptools_1.1-4 data.table_1.14.2 R6_2.5.1 KernSmooth_2.23-20 parallelly_1.31.1 codetools_0.2-18
[109] MASS_7.3-56 assertthat_0.2.1 rprojroot_2.0.3 withr_2.5.0 rnaturalearth_0.1.0 multcomp_1.4-19
[115] hms_1.1.1 grid_4.2.0 rpart_4.1.16 timeDate_3043.102 coda_0.19-4 class_7.3-20
[121] carData_3.0-5 mda_0.5-2 pROC_1.18.0 lubridate_1.8.0
### Installing
* Code is available at GitHub (<5MB).
* Data can be downloaded according to Zeiss et al. (2023, Conservation Biology).
Folder structure should be maintained. Additionally, please download maxent.jar according to description from dismo package. Working directory during analysis should be the main folder. This is necessary to access all data files that will be called with `here::here()`.
### Executing program
* Open the R scripts following the numbering.
* To perform Species Distribution Models (SDMs), you need to create the raster stack(s) of environmental variables (*EnvPredictor_2km...*) in the *results* subfolder. This is done in "01b_Combine_predictors.R".
Note: The R script "04b_Identify_topPredictors.R" might only run in R but not in RStudio.
## Help
You may need to create subfolders (results, docs, figures) to run each line of code and save results and figures.
## Authors
Contributors names and contact info
Romy Zeiss
[@RomyZeiss](https://twitter.com/romyzeiss)
Maria J. I. Briones
Jérome Mathieu
Angela Lomba
Jessica Dahlke
Laura-Fiona Heptner
Gabriel Salako
Nico Eisenhauer
Carlos A. Guerra
## License
This project is licensed under the MIT License - see the LICENSE.md file for details.
## Acknowledgments
R.Z. acknowledge support of iDiv funded by the German Research Foundation (DFG FZT 118, 202548816), and L. Figueiredo from the iDiv Data & Code Unit for assistance with curation and archiving of the dataset. R.Z. was funded by the German Federal Environmental Foundation (DBU, 20021/752).
## References
Zeiss, R., Briones, M. J., Mathieu, J., Lomba, A., Dahlke, J., Heptner, L. F., ... & Guerra, C. A. (2023). Effects of climate on the distribution and conservation of commonly observed European earthworms. Conservation Biology.
Zeiss, R., Guerra, C. A., Briones, M. J. I., Mathieu, J., Russel, D., & Cano-Díaz, C. (2023). Earthworm distribution and conservation in Europe (Version 1.0) [Dataset]. iDiv Data Repository. (iDiv) Halle-Jena-Leipzig. https://doi.org/10.25829/idiv.3524-gqvs4z
## Folder structure
#### data_environment
Environmental data to predict species dsitributions.
* METADATA_Predictors.csv
* Information about predictors such as source, resolution and processing.
#### data_raw
Species occurrence data
*Note* Species occurrence data downloaded from diverse sources. GBIF data with search term "Crassiclitellata" is available at [https://doi.org/10.15468/dl.ghtngj]. Date 2022-03-05. GBIF Download content: metadata, and occurrence records (occurrence.txt). Data compiled by Jérome Mathieu is available upon request and will be published separately. Version date: 2022-05-20.
* Edaphobase_download_24-Feb-2021_Lumbricidae_Europe.csv
* Species occurrence data downloaded from Edaphobase, search term "Lumbricidae". Date 2022-01-25. After download but before data processing, we manually changed ö/ä/ü/ß into oe/ae/ue/ss.
* SiteData_sWorm_v2.csv
* Coordinates associated with sites in species occurrence data from global sWorm project. Download: 2021-06-14. Publication: Phillips et al. (2021). Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. Scientific Data, 8(1), 1-12.
* SoilReCon_earthworms_clean.csv
* Species occurrence data from SoilReCon project in Portugal. Version date: 2022-05-20. Data creator: Carlos A. Guerra and Concha Cano Diaz.
* Species_list_Crassiclitellata.csv
* Table with raw species names, their accepted taxon names, ecological grouping, and abbreviations.
* Species_list_Crassiclitellata_short.csv
* Table with accepted species names and abbreviations of species with >= 10 records only.
* SppOccData_sWorm_v2.csv
* Species occurrence data from global sWorm project with site names. Download: 2021-06-14. Publication: Phillips et al. (2021). Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. Scientific Data, 8(1), 1-12.
#### doc
Documentation: taxonomic backbone used in this study, and ODMAP protocol on SDM procedure.
* Add_Bottinelli_ecogroups.R
* R script to identify accepted taxonomic names (based on taxize package & ITIS database) and ecological groups (based on Bottinelli and Bouché).
* Bottinelli_2020_Earthworm_classification.csv
* Supplementary data from Bottinelli 2020 containing species names and their ecological grouping.
* ODMAP_Zeiss_et_al_2022-09-15.csv
* ODMAP protocol summarizing settings and input data of Species Distribution Modeling. Version: 2022-09-15.
#### intermediates
* Earthworm_occurrence_GBIF-sWorm-Edapho-SoilReCon.csv
* Species occurrence data cleaned and compiled from single datasets. Please note that the dataset compiled by Jérome Mathieu (JM) had to be excluded. The full processed dataset is avaiable upon request.
#### results
* ~/Maps
* Folder with raster stacks containing species distributions (.RData files) under current and future climatic conditions.
* ~/SDMs
* Folder containing species-specific Species Distribution Model output (.RData files) from BIOMOD analysis.
* ~/_Sensitivity
* Analysis of sensitivity against number of occurrences (numeric). Run with 5, 10, 20, and 50 occurrences per focal species (n>=100 records).
* ~/_Sensitivity_2
* Analysis of sensitivity against number of occurrences (percent). Run with 50%, 75%, and 90% of all occurrences per focal species (n>=100 records).
* ~/_TopPredictor
* Folder containing the MaxEnt analysis to identify the top 10 important predictors based on species with more than or equal to 10 presence records
* ~/biomod_files
* Folder containing preliminary and the BIOMOD output for each species (i.e., one folder per species).
* ~/maxent_files
* Folder containing MaxEnt output with one folder per species.
* corMatPearson_predictors.csv
* Correlation matrix showing Pearson correlations among the predictors (i.e., environmental variables) at 2km resolution.
* corMatSpearman_predictors.csv
* Correlation matrix showing Spearman correlations among the predictors (i.e., environmental variables) at 2km resolution.
* FlaggedRecords_Crassiclitellata.csv
* Table containing flagged records based on issues in coordinates as identified by CoordinateCleaner package.
* GBIF_issues_Crassiclitellata.txt
* List of unique issues (column "issue") in raw GBIF occurrence data.
* NoRecords_cleaning_Crassiclitellata.csv
* Table with number of records after each cleaning step for GBIF dataset and total occurrence dataset.
* NoRecords_perSpecies_full_Crassiclitellata.csv
* Table with number of records per species and datasource before (raw) and after data cleaning (clean).
* NumSpecies_perOcc.csv
* Table containing the number of species per number of occurrence (i.e., number of 2km² cells, records, or records used in BIOMOD).
* Occurrence_rasterized_2km_Crassiclitellata.csv
* Species occurrence records summarized for each 2km² grid cell and species based on the respective grid .tif file.
* Occurrence_rasterized_2km_BIOMOD_Crassiclitellata.csv
* Species occurrence records summarized for each 2km² grid cell and species based on the respective grid .tif file and subsetted to the minimum extent of the envrionmental variables (i.e., excluding grid cells with NAs in 1+ layers). In addition, this file contains 10,000 pseudo-absences per species created with biomod2::BIOMOD_FormattingData() function.
* Species_list_Crassiclitellata.csv
* Table with raw species names, their accepted taxon names, ecological grouping, abbreviations, and summary statistics (i.e., Number of records, presence cells at 2km and presences in BIOMOD data).
* Variable_importance_MaxEnt_Crassiclitellata.csv
* Permuttation importance values for all 29 predictor variables from species-specific MaxEnt analysis.
* VIF_predictors.csv
* Variable inflation factor before (column VIF_raw) and after removing variables with VIF>10 (VIF), and Pearson correlations calculated by usdm::vifcor() function.
#### src
* 01a_Prepare_2km_predictors_terra.R
* R code to prepare predictors at 2km resolution using terra package.
* 01a_Prepare_5km_predictors_terra.R
* R code to prepare predictors at 5km resolution using terra package.
* 01b_Combine_predictors.R
* R code to combine predictors into one raster stack and to normalize all variables (scale with mean of 0).
* 01c_Colinearity_predictors.R
* R code to identify multicolinearity between predictors. Predictors are only excluded during the next step.
* 01d_Clip_predictors.R
* R code to clip spatial extent of environmental stack. Grid cells with NA values in 1+ layers are removed.
* 02a_OPTIONAL_Download_GBIF_occurrences.R
* R code to download GBIF in R using rgif package. Output is saved with different name than manually (DOI-linked) dataset.
* 02b_Clean_GBIF_occurrences.R
* R code to clean GBIF occurrence records based on GBIF metadata.
* 03a_Prepare_biodiversity_data.R
* R code to harmonize, merge and clean occurrence data from different sources using CoordinateCleaner package among others.
* 03b_Occurrences_to_10km_grid.R
* OPTIONAL R code to rasterize occurrence point records into German 10km² grid system from project Faktencheck.
* 03b_Occurrences_to_2km_grid.R
* R code to rasterize occurrence point records into European 2km² grid system.
* 03b_Occurrences_to_5km_grid.R
* OPTIONAL R code to rasterize occurrence point records into European 5km² grid system. Only used during preanalysis.
* 04a_Prepare_input_data.R
* R code to prepare input using biomod2 package. Output is used for both, MaxEnt and BIOMOD analysis.
* 04b_Identify_topPredictors.R
* R code to select top 10 important predictors by running MaxEnt analysis and identified variables with highest median permuttation importance.
* 05_Build_SDMs.R
* R code to build Species Distribution Models (SDMs) using BIOMOD. Output is saved in folders per species.
* 06a_Predict_SDM_current.R
* R code to predict species' presence into whole environment under current climate conditions.
* 06b_Predict_SDM_future.R
* R code to predict species' presence into whole environment under 15 future climate conditions.
* 07_Variable_importance.R
* R code to extract and combine variable importances from BIOMOD output.
* 08_Uncertainty_analysis.R
* R code to extract uncertainty (coefficient of variance) from BIOMOD output and create dataframe containing proper spatial extent for mapping (uncertainty<0.1).
* 09_Model_performance.R
* R code to extract model performance measures (AUC(ROC), TSS and Kappa) from BIOMOD output.
* 10a_Calculate_richness_current.R
* R code to calculate species richness under current climatic conditions (i.e., using projections with current climate). Includes prelimiary beta diversity and cluster analysis.
* 10b_Calculate_richness_future.R
* R code to calculate species richness under future climatic conditions (i.e., using projections with 15 future climate scenarios).
* 11_Range_sizes.R
* R code to estimate range sizes of earthworm species under current and future climate conditions.
* 12a_Prepare_protection.R
* R code to extract protection status of each 5km² grid cell based on WDPA and WDOEC networks in Europe. Protection status is estimated as percentage of cover of each protected area falling into one of the IUCN categories.
* 12b_Estimate_conservation_status.R
* R code to calculate protection status of each earthworm species by counting presence grid cells covered by certain protected area categories.
* 13_Climate_impact_analysis.R
* R code to test the effect of mean annual temperature and precipitation seasonality on future prediction by using both, only temperature, or only precipitation predicted into the future. Other variables are kept as they were under current climate condictions.
* 14_Plotting_and_results.R
* R code to create and save figures, and extract some values.
* 15a_Sensitivity_analysis_number.R
* R code to test the effect of the number of occurrences per species on species richness. Number of occurrences were subsetted to 5, 10, 20, and 50 presence grid cells for each of the focal species. Corresponding folder: ~/results/_Sensitivity
* 15b_Sensitivity_analysis_percent.R
* R code to test the effect of the number of occurrences per species on species richness. Number of occurrences were subsetted to 50, 75, and 95% of the original number of presence grid cells for each of the focal species. Corresponding folder: ~/results/_Sensitivity_2
* 16_Results_for_Faktencheck.R
* R code to extract results used for the project Faktencheck.
Remaining scripts are outdates but contain helpful lines of code to run diverse algorithms separately, prepare predictors at different resoltions and using raster instead of terra package, and unfinished scripts to estimate landscape fragmentation layer, and build Ensembles of Small Models.
* OLD_00_SoilBiodiversity_SDMs.Rmd
* OLD_01_Prepare_1km_predictors.R
* OLD_01a_Prepare_2km_predictors.R
* OLD_01a_Prepare_5km_predictors.R
* OLD_03b_Occurrences_to_1km_grid.R
* OLD_04_Create_backgroundData.R
* OLD_05_Prepare_input.R
* OLD_06_SDM_Valavi.R
* OLD_06_SDM_Valavi_runParallel.R
* OLD_07_Model_performance.R
* OLD_08_Predict_SDMs.R
* OLD_09_Variable_importance.R
* OLD_10_Estimate_richness.R
* OLD_Background_drivers.Rmd
* OLD_Calculate_LandscapeFragmentation.Rmd
* OLD_ecospat_function_fixed.R
* OLD_SDM_ensm_ESM_runParallel.R
* OLD_SDM_ensm_runParallel.R