GitHub - JeMaNd-r/SoilBiodiversity: Get distrubution of soil biodiversity in Germany

JeMaNd-r / SoilBiodiversity Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Get distrubution of soil biodiversity in Germany

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 600 Commits
data_environment		data_environment
data_raw		data_raw
doc		doc
intermediates		intermediates
results		results
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
METADATA_files.txt		METADATA_files.txt
METADATA_files_all.txt		METADATA_files_all.txt
METADATA_variables.txt		METADATA_variables.txt
README.Rmd		README.Rmd

Repository files navigation

<<<<<<< Updated upstream

# Distribution and protection of commonly observed earthworm species across Europe based on SDMs
=======

# Soil Biodiversity across Europe based on SDMs
>>>>>>> Stashed changes

The following code used occurrence data and environmental variables to predict Species Distribution Models (SDMs) for earthworms (Crassiclitellata) across Europe.

## Description

Belowground biodiversity does not necessarily follow aboveground biodiversity patterns, but maps of soil biodiversity remain scarce because of limited data availability. Earthworms belong to the most thoroughly studied soil organisms, and - in their role as ecosystem engineers - have a significant impact on ecosystem functioning. Here, we address the questions where and what is needed to protect European earthworm species. We focus on highly recorded species only, that is on species with broad geographic ranges and therefore an appropriate number of occurrence records in Europe (min. 10 records). First, we map the spatial distribution of common earthworm species and the respective total species richness. We use multiple environmental predictors spanning climatic, edaphic, and topological factors. In accordance with the literature, we predicted potential species distributions by well-performing models (i.e., MaxEnt and Ensemble modeling). Second, we investigate how well the different types of protected areas cover mean and individual species ranges by estimating the respective proportion of range area. Based on the protection coverage, we are able to identify vulnerable species, that is species whose ranges only sparsely overlap with current protected areas.

## Getting Started

### Dependencies

* Windows 10
* R version 4.1.2
* RStudio version 1.2.5033

* sessionInfo()
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] exactextractr_0.8.2 sf_1.0-7 rgdal_1.5-31 colorRamps_2.3.1 gridExtra_2.3
[6] ggpubr_0.4.0 doParallel_1.0.17 iterators_1.0.14 foreach_1.5.2 rJava_1.0-6
[11] dismo_1.3-5 biomod2_3.5-1 CoordinateCleaner_2.0-20 rgbif_3.7.2 caret_6.0-92
[16] lattice_0.20-45 usdm_1.1-18 here_1.0.1 raster_3.5-15 sp_1.4-7
[21] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[26] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1 terra_1.5-21

loaded via a namespace (and not attached):
[1] spam_2.8-0 readxl_1.4.0 uuid_1.1-0 backports_1.4.1 plyr_1.8.7 lazyeval_0.2.2
[7] splines_4.2.0 listenv_0.8.0 TH.data_1.1-1 digest_0.6.29 viridis_0.6.2 earth_5.3.1
[13] fansi_1.0.3 magrittr_2.0.3 tzdb_0.3.0 recipes_0.2.0 globals_0.14.0 modelr_0.1.8
[19] gower_1.0.0 matrixStats_0.62.0 sandwich_3.0-1 hardhat_0.2.0 colorspace_2.0-3 rvest_1.0.2
[25] haven_2.5.0 crayon_1.5.1 jsonlite_1.8.0 zoo_1.8-10 survival_3.3-1 glue_1.6.2
[31] PresenceAbsence_1.1.10 gtable_0.3.0 ipred_0.9-12 emmeans_1.8.1-1 car_3.0-13 future.apply_1.9.0
[37] biscale_1.0.0 maps_3.4.0 abind_1.4-5 scales_1.2.0 oai_0.3.2 mvtnorm_1.1-3
[43] DBI_1.1.2 rstatix_0.7.0 Rcpp_1.0.8.3 plotrix_3.8-2 viridisLite_0.4.0 xtable_1.8-4
[49] units_0.8-0 rworldmap_1.3-6 foreign_0.8-82 proxy_0.4-26 dotCall64_1.0-1 Formula_1.2-4
[55] stats4_4.2.0 lava_1.6.10 prodlim_2019.11.13 httr_1.4.3 geosphere_1.5-14 ellipsis_0.3.2
[61] pkgconfig_2.0.3 reshape_0.8.9 nnet_7.3-17 dbplyr_2.1.1 utf8_1.2.2 conditionz_0.1.0
[67] tidyselect_1.1.2 rlang_1.0.2 reshape2_1.4.4 munsell_0.5.0 TeachingDemos_2.12 cellranger_1.1.0
[73] tools_4.2.0 cli_3.3.0 generics_0.1.2 broom_0.8.0 maxnet_0.1.4 ModelMetrics_1.2.2.2
[79] fs_1.5.2 precrec_0.12.9 randomForest_4.7-1 future_1.25.0 nlme_3.1-157 whisker_0.4
[85] xml2_1.3.3 compiler_4.2.0 rstudioapi_0.13 ggsignif_0.6.3 e1071_1.7-9 reprex_2.0.1
[91] stringi_1.7.6 plotmo_3.6.1 fields_13.3 rgeos_0.5-9 Matrix_1.4-1 classInt_0.4-3
[97] gbm_2.1.8 vctrs_0.4.1 pillar_1.7.0 lifecycle_1.0.1 estimability_1.4.1 cowplot_1.1.1
[103] maptools_1.1-4 data.table_1.14.2 R6_2.5.1 KernSmooth_2.23-20 parallelly_1.31.1 codetools_0.2-18
[109] MASS_7.3-56 assertthat_0.2.1 rprojroot_2.0.3 withr_2.5.0 rnaturalearth_0.1.0 multcomp_1.4-19
[115] hms_1.1.1 grid_4.2.0 rpart_4.1.16 timeDate_3043.102 coda_0.19-4 class_7.3-20
[121] carData_3.0-5 mda_0.5-2 pROC_1.18.0 lubridate_1.8.0

### Installing

* Code is available at GitHub (<5MB).
* Data can be downloaded according to Zeiss et al. (2023, Conservation Biology).

Folder structure should be maintained. Additionally, please download maxent.jar according to description from dismo package. Working directory during analysis should be the main folder. This is necessary to access all data files that will be called with `here::here()`.

### Executing program

* Open the R scripts following the numbering.
* To perform Species Distribution Models (SDMs), you need to create the raster stack(s) of environmental variables (*EnvPredictor_2km...*) in the *results* subfolder. This is done in "01b_Combine_predictors.R".

Note: The R script "04b_Identify_topPredictors.R" might only run in R but not in RStudio.

## Help

You may need to create subfolders (results, docs, figures) to run each line of code and save results and figures.

## Authors

Contributors names and contact info

Romy Zeiss
[@RomyZeiss](https://twitter.com/romyzeiss)

Maria J. I. Briones
Jérome Mathieu
Angela Lomba
Jessica Dahlke
Laura-Fiona Heptner
Gabriel Salako
Nico Eisenhauer
Carlos A. Guerra

## License

This project is licensed under the MIT License - see the LICENSE.md file for details.

## Acknowledgments

R.Z. acknowledge support of iDiv funded by the German Research Foundation (DFG FZT 118, 202548816), and L. Figueiredo from the iDiv Data & Code Unit for assistance with curation and archiving of the dataset. R.Z. was funded by the German Federal Environmental Foundation (DBU, 20021/752).

## References

Zeiss, R., Briones, M. J., Mathieu, J., Lomba, A., Dahlke, J., Heptner, L. F., ... & Guerra, C. A. (2023). Effects of climate on the distribution and conservation of commonly observed European earthworms. Conservation Biology.
Zeiss, R., Guerra, C. A., Briones, M. J. I., Mathieu, J., Russel, D., & Cano-Díaz, C. (2023). Earthworm distribution and conservation in Europe (Version 1.0) [Dataset]. iDiv Data Repository. (iDiv) Halle-Jena-Leipzig. https://doi.org/10.25829/idiv.3524-gqvs4z

## Folder structure

#### data_environment
Environmental data to predict species dsitributions.

* METADATA_Predictors.csv
* Information about predictors such as source, resolution and processing.

#### data_raw
Species occurrence data

*Note* Species occurrence data downloaded from diverse sources. GBIF data with search term "Crassiclitellata" is available at [https://doi.org/10.15468/dl.ghtngj]. Date 2022-03-05. GBIF Download content: metadata, and occurrence records (occurrence.txt). Data compiled by Jérome Mathieu is available upon request and will be published separately. Version date: 2022-05-20.

* Edaphobase_download_24-Feb-2021_Lumbricidae_Europe.csv
* Species occurrence data downloaded from Edaphobase, search term "Lumbricidae". Date 2022-01-25. After download but before data processing, we manually changed ö/ä/ü/ß into oe/ae/ue/ss.

* SiteData_sWorm_v2.csv
* Coordinates associated with sites in species occurrence data from global sWorm project. Download: 2021-06-14. Publication: Phillips et al. (2021). Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. Scientific Data, 8(1), 1-12.

* SoilReCon_earthworms_clean.csv
* Species occurrence data from SoilReCon project in Portugal. Version date: 2022-05-20. Data creator: Carlos A. Guerra and Concha Cano Diaz.

* Species_list_Crassiclitellata.csv
* Table with raw species names, their accepted taxon names, ecological grouping, and abbreviations.

* Species_list_Crassiclitellata_short.csv
* Table with accepted species names and abbreviations of species with >= 10 records only.

* SppOccData_sWorm_v2.csv
* Species occurrence data from global sWorm project with site names. Download: 2021-06-14. Publication: Phillips et al. (2021). Global data on earthworm abundance, biomass, diversity and corresponding environmental properties. Scientific Data, 8(1), 1-12.

#### doc
Documentation: taxonomic backbone used in this study, and ODMAP protocol on SDM procedure.
* Add_Bottinelli_ecogroups.R
* R script to identify accepted taxonomic names (based on taxize package & ITIS database) and ecological groups (based on Bottinelli and Bouché).

* Bottinelli_2020_Earthworm_classification.csv
* Supplementary data from Bottinelli 2020 containing species names and their ecological grouping.

* ODMAP_Zeiss_et_al_2022-09-15.csv
* ODMAP protocol summarizing settings and input data of Species Distribution Modeling. Version: 2022-09-15.

#### intermediates

* Earthworm_occurrence_GBIF-sWorm-Edapho-SoilReCon.csv
* Species occurrence data cleaned and compiled from single datasets. Please note that the dataset compiled by Jérome Mathieu (JM) had to be excluded. The full processed dataset is avaiable upon request.

#### results

* ~/Maps
* Folder with raster stacks containing species distributions (.RData files) under current and future climatic conditions.

* ~/SDMs
* Folder containing species-specific Species Distribution Model output (.RData files) from BIOMOD analysis.

* ~/_Sensitivity
* Analysis of sensitivity against number of occurrences (numeric). Run with 5, 10, 20, and 50 occurrences per focal species (n>=100 records).

* ~/_Sensitivity_2
* Analysis of sensitivity against number of occurrences (percent). Run with 50%, 75%, and 90% of all occurrences per focal species (n>=100 records).

* ~/_TopPredictor
* Folder containing the MaxEnt analysis to identify the top 10 important predictors based on species with more than or equal to 10 presence records

* ~/biomod_files
* Folder containing preliminary and the BIOMOD output for each species (i.e., one folder per species).

* ~/maxent_files
* Folder containing MaxEnt output with one folder per species.

* corMatPearson_predictors.csv
* Correlation matrix showing Pearson correlations among the predictors (i.e., environmental variables) at 2km resolution.

* corMatSpearman_predictors.csv
* Correlation matrix showing Spearman correlations among the predictors (i.e., environmental variables) at 2km resolution.

* FlaggedRecords_Crassiclitellata.csv
* Table containing flagged records based on issues in coordinates as identified by CoordinateCleaner package.

* GBIF_issues_Crassiclitellata.txt
* List of unique issues (column "issue") in raw GBIF occurrence data.

* NoRecords_cleaning_Crassiclitellata.csv
* Table with number of records after each cleaning step for GBIF dataset and total occurrence dataset.

* NoRecords_perSpecies_full_Crassiclitellata.csv
* Table with number of records per species and datasource before (raw) and after data cleaning (clean).

* NumSpecies_perOcc.csv
* Table containing the number of species per number of occurrence (i.e., number of 2km² cells, records, or records used in BIOMOD).

* Occurrence_rasterized_2km_Crassiclitellata.csv
* Species occurrence records summarized for each 2km² grid cell and species based on the respective grid .tif file.

* Occurrence_rasterized_2km_BIOMOD_Crassiclitellata.csv
* Species occurrence records summarized for each 2km² grid cell and species based on the respective grid .tif file and subsetted to the minimum extent of the envrionmental variables (i.e., excluding grid cells with NAs in 1+ layers). In addition, this file contains 10,000 pseudo-absences per species created with biomod2::BIOMOD_FormattingData() function.

* Species_list_Crassiclitellata.csv
* Table with raw species names, their accepted taxon names, ecological grouping, abbreviations, and summary statistics (i.e., Number of records, presence cells at 2km and presences in BIOMOD data).

* Variable_importance_MaxEnt_Crassiclitellata.csv
* Permuttation importance values for all 29 predictor variables from species-specific MaxEnt analysis.

* VIF_predictors.csv
* Variable inflation factor before (column VIF_raw) and after removing variables with VIF>10 (VIF), and Pearson correlations calculated by usdm::vifcor() function.

#### src

* 01a_Prepare_2km_predictors_terra.R
* R code to prepare predictors at 2km resolution using terra package.

* 01a_Prepare_5km_predictors_terra.R
* R code to prepare predictors at 5km resolution using terra package.

* 01b_Combine_predictors.R
* R code to combine predictors into one raster stack and to normalize all variables (scale with mean of 0).

* 01c_Colinearity_predictors.R
* R code to identify multicolinearity between predictors. Predictors are only excluded during the next step.

* 01d_Clip_predictors.R
* R code to clip spatial extent of environmental stack. Grid cells with NA values in 1+ layers are removed.

* 02a_OPTIONAL_Download_GBIF_occurrences.R
* R code to download GBIF in R using rgif package. Output is saved with different name than manually (DOI-linked) dataset.

* 02b_Clean_GBIF_occurrences.R
* R code to clean GBIF occurrence records based on GBIF metadata.

* 03a_Prepare_biodiversity_data.R
* R code to harmonize, merge and clean occurrence data from different sources using CoordinateCleaner package among others.

* 03b_Occurrences_to_10km_grid.R
* OPTIONAL R code to rasterize occurrence point records into German 10km² grid system from project Faktencheck.

* 03b_Occurrences_to_2km_grid.R
* R code to rasterize occurrence point records into European 2km² grid system.

* 03b_Occurrences_to_5km_grid.R
* OPTIONAL R code to rasterize occurrence point records into European 5km² grid system. Only used during preanalysis.

* 04a_Prepare_input_data.R
* R code to prepare input using biomod2 package. Output is used for both, MaxEnt and BIOMOD analysis.

* 04b_Identify_topPredictors.R
* R code to select top 10 important predictors by running MaxEnt analysis and identified variables with highest median permuttation importance.

* 05_Build_SDMs.R
* R code to build Species Distribution Models (SDMs) using BIOMOD. Output is saved in folders per species.

* 06a_Predict_SDM_current.R
* R code to predict species' presence into whole environment under current climate conditions.

* 06b_Predict_SDM_future.R
* R code to predict species' presence into whole environment under 15 future climate conditions.

* 07_Variable_importance.R
* R code to extract and combine variable importances from BIOMOD output.

* 08_Uncertainty_analysis.R
* R code to extract uncertainty (coefficient of variance) from BIOMOD output and create dataframe containing proper spatial extent for mapping (uncertainty<0.1).

* 09_Model_performance.R
* R code to extract model performance measures (AUC(ROC), TSS and Kappa) from BIOMOD output.

* 10a_Calculate_richness_current.R
* R code to calculate species richness under current climatic conditions (i.e., using projections with current climate). Includes prelimiary beta diversity and cluster analysis.

* 10b_Calculate_richness_future.R
* R code to calculate species richness under future climatic conditions (i.e., using projections with 15 future climate scenarios).

* 11_Range_sizes.R
* R code to estimate range sizes of earthworm species under current and future climate conditions.

* 12a_Prepare_protection.R
* R code to extract protection status of each 5km² grid cell based on WDPA and WDOEC networks in Europe. Protection status is estimated as percentage of cover of each protected area falling into one of the IUCN categories.

* 12b_Estimate_conservation_status.R
* R code to calculate protection status of each earthworm species by counting presence grid cells covered by certain protected area categories.

* 13_Climate_impact_analysis.R
* R code to test the effect of mean annual temperature and precipitation seasonality on future prediction by using both, only temperature, or only precipitation predicted into the future. Other variables are kept as they were under current climate condictions.

* 14_Plotting_and_results.R
* R code to create and save figures, and extract some values.

* 15a_Sensitivity_analysis_number.R
* R code to test the effect of the number of occurrences per species on species richness. Number of occurrences were subsetted to 5, 10, 20, and 50 presence grid cells for each of the focal species. Corresponding folder: ~/results/_Sensitivity

* 15b_Sensitivity_analysis_percent.R
* R code to test the effect of the number of occurrences per species on species richness. Number of occurrences were subsetted to 50, 75, and 95% of the original number of presence grid cells for each of the focal species. Corresponding folder: ~/results/_Sensitivity_2

* 16_Results_for_Faktencheck.R
* R code to extract results used for the project Faktencheck.

Remaining scripts are outdates but contain helpful lines of code to run diverse algorithms separately, prepare predictors at different resoltions and using raster instead of terra package, and unfinished scripts to estimate landscape fragmentation layer, and build Ensembles of Small Models.

* OLD_00_SoilBiodiversity_SDMs.Rmd
* OLD_01_Prepare_1km_predictors.R
* OLD_01a_Prepare_2km_predictors.R
* OLD_01a_Prepare_5km_predictors.R
* OLD_03b_Occurrences_to_1km_grid.R
* OLD_04_Create_backgroundData.R
* OLD_05_Prepare_input.R
* OLD_06_SDM_Valavi.R
* OLD_06_SDM_Valavi_runParallel.R
* OLD_07_Model_performance.R
* OLD_08_Predict_SDMs.R
* OLD_09_Variable_importance.R
* OLD_10_Estimate_richness.R
* OLD_Background_drivers.Rmd
* OLD_Calculate_LandscapeFragmentation.Rmd
* OLD_ecospat_function_fixed.R
* OLD_SDM_ensm_ESM_runParallel.R
* OLD_SDM_ensm_runParallel.R