The estimation of reliable indices of abundance for sedentary stocks requires the incorporation of the underlying spatial population structure, including issues arising from the sampling design and zero inflation. We applied seven spatial interpolation techniques [ordinary kriging (OK), kriging with external drift (KED), a negative binomial generalized additive model (NBGAM), NBGAM plus OK (NBGAM+OK), a general additive mixed model (GAMM), GAMM plus OK (GAMM+OK) and a zero-inflated negative binomial model (ZINB) ] to three survey datasets to estimate biomass for the gastropod
Las estimaciones de índices de abundancia relativa para evaluar poblaciones de especies sedentarias requieren tener en cuenta su estructura espacial, el diseño del muestreo y el alto número de ceros registrados a la hora del muestreo. Para obtener un índice confiable de abundancia para el gasterópodo
Sedentary species aggregate over space to carry out biological and social functions resulting in a highly complex heterogeneous distribution (
Geostatistical models are a group of spatial analysis and interpolation methods based on the theory of the regionalized variable (
In this study, we conducted a comprehensive comparative spatial analysis using seven models: ordinary kriging (OK), kriging with external drift (KED), negative binomial GAM (NBGAM), a general additive mixed model (GAMM), a zero-inflated negative binomial model (ZINB) and two hybrid regression kriging models, NBGAM+OK and GAMM+OK. These particular models were chosen because they can deal with highly skewed and zero-inflated data. They were applied to three surveys of the gastropod queen conch,
The queen conch is a highly exploited species on the Pedro Bank and throughout its wider Caribbean range, particularly over the last forty years (
We aimed firstly to assess the performance of alternative spatial models in estimating biomass given the structure of the data. Secondly, we determined the implications of not accounting for the spatial population structure by comparing the spatial model estimations with two design methods, one with a stratified and the other with a non-stratified sampling design. The performance of the spatial interpolation models was evaluated using a ten-fold cross-validation residual analysis and visual examination of the resultant species distribution maps. The results of the analyses are discussed in terms of the main data issues presented by each survey data set, including sample design, zero-inflation and over-dispersion of variances, and how they affect the reliability of the estimations. The implications for spatial management of the species in light of its ecology are also discussed. All modelling procedures and the output thereof, including maps, were carried out and produced using packages available in the R program language (
The Pedro Bank is a shallow offshore bank located in the west-central Caribbean approximately 80 km south of the main island of Jamaica (
Distribution data were compiled from underwater visual surveys conducted by the National Fisheries Authority of Jamaica (NFA) for the years 2007, 2015 and 2018. The NFA has conducted such surveys at three to five-year intervals since 1994, but only after 2007 did sampling target the entire extent of the bank. The years 2007, 2015 and 2018 were chosen to contrast sample design, sample sizes and spatial population structures such as the size and location of high-density patches. The 2007 survey consisted of 60 sites clustered in the centre of the study area, while the 2015 and 2018 surveys included 80 and 81 sites, respectively, systematically distributed down to the 30-metre contour (
We collated broad-scale spatial environmental layers for geographic location (latitude and longitude), habitat type and depth (bathymetry) of the bank to be used as covariates in the interpolation models. Geographic location was selected as a proxy for interactions between queen conch location and ecological processes (
We created a 100 m² gridded interpolation surface down to the 30 m depth contour consisting of 606000 pixels for a total area of 6060 km². Because the sampling area for 2007 was smaller, the grid was adjusted to 204000 pixels or approximately 2040 km² to avoid extrapolation for that dataset. Each environmental layer was converted to the same 100 m² resolution so that the interpolation outputs were comparable. For the interpolation process, each pixel was considered a discrete unit, so the interpolated value in each pixel was multiplied by the ratio of the total area of one pixel (1 km²) and the mean site area for each survey year to compensate for the difference between the unit used for the interpolation surface (pixel) and the area of the sampled sites (1 m²). Total biomass within a pixel was calculated by transforming the estimated abundances using queen conch weight conversion factors corresponding to the 50% clean processing level (shell, viscera and opercula removed) for adults (
We calculated two design-based (Cochran 1977) biomass estimations for comparison with the spatial interpolation methods: the first was a non-stratified method and the second one was stratified. The non-stratified method consisted of an estimate based on the product of the mean abundance and the total area and the weight conversion factors for adults and juveniles. Design method estimations can be significantly improved by stratification (
Ordinary kriging and KED were applied to the datasets using functions and procedures in the R gstat package (
where
Ordinary kriging (OK) assumes a stationary local mean around each sample and uses an unbiased linear combination of the surrounding weighted values produced by the variogram to estimate values at unsampled sites (
where
KED assumes that the local trend or drift is a linear function of a secondary variable (
where
A spatial version of GAM (
where
The GAMMs included optimal variance and correlation structures to model the random error term , heterogeneity of the variance and residual correlation in the data. Optimal variance and correlation structures were selected for the GAMM by minimizing the AIC among competing structures (
Two hybrid regression kriging (RK) models (
The spatial negative binominal zero-inflated model (ZINB) directly models the two sources of excessive zeros which can cause bias in spatially sampled data: sampling error and ecological processes (
Statistical evaluation and comparison of spatial model performance were done using ten-fold cross-validation. Two diagnostic statistics were calculated, the mean absolute error (MAE), a measure of prediction bias, and the root mean squared error (RMSE), which reflects prediction precision or prediction error (
where
Structural data trends from the surveys included a high frequency of sites where zeros were recorded relative to sites with positive counts (
2007 | 2015 | 2018 | ||||
---|---|---|---|---|---|---|
Adult | Juvenile | Adult | Juvenile | Adult | Juvenile | |
Sample size | 60 | 60 | 80 | 80 | 81 | 81 |
Minimum | 0 | 0 | 0 | 0 | 0 | 0 |
Maximum | 350 | 95 | 165 | 85 | 250 | 181 |
Mean | 17 | 7 | 34 | 11 | 12 | 12 |
Median | 1 | 1 | 17 | 3 | 2 | 2 |
Variance | 2458 | 315 | 1387 | 290 | 958 | 795 |
Standard deviation | 49.6 | 17.8 | 37.3 | 17.0 | 31.0 | 28.2 |
Coefficient of variation | 3.0 | 2.6 | 1.1 | 1.6 | 2.7 | 2.4 |
Kurtosis | 32.1 | 11.7 | 0.9 | 5.9 | 41.0 | 19.9 |
Skewness | 5.4 | 3.4 | 1.2 | 2.4 | 5.9 | 4.2 |
Percentage of zeros | 33 | 48 | 14 | 24 | 36 | 36 |
The performance of the spatial models reflected their underlying interpolation assumptions and the structural characteristics of the survey datasets. There was no clear correspondence between bias measured by the MAE and the prediction precision based on the RMSE. In other words, a model returning the lowest MAE did not necessarily return the lowest RMSE (
Juvenile | Adult | ||||||
---|---|---|---|---|---|---|---|
Year | Method | Biomass (t) | MAE | RMSE | Biomass (t) | MAE | RMSE |
2007 | |||||||
OK | 306 | 0.34 | 17.70 | 2171 | 0.57 | 56.68 | |
KED | 347 | 0.31 | 18.32 | 2484 | 2.78 | 61.38 | |
NBGAM | 63 | 0.45 | 15.89 | 1173 | 0.22† | 20.52 | |
NBGAM+OK | 47 | 0.92 | 15.70 | 1104 | 0.72 | 20.40 | |
GAMM | 242 | 0.25 | 14.87 | 1783 | 0.57 | 20.00* | |
GAMM+OK | 320 | 1.27 | 14.46* | 3535 | 88.08 | 208.72 | |
ZINB | 143 | 0.13† | 14.47 | 2645 | 5.02 | 48.77 | |
2015 | |||||||
OK | 2060 | 0.58 | 16.82 | 16775 | 0.04^{†} | 33.60* | |
KED | 1856 | 0.09 | 16.64 | 14928 | 0.20 | 35.83 | |
NBGAM | 1801 | 0.61 | 17.69 | 18184 | 0.81 | 38.38 | |
NBGAM+OK | 1783 | 0.16 | 17.64 | 17135 | 0.44 | 38.37 | |
GAMM | 2536 | 0.06† | 16.94 | 23515 | 0.06 | 38.11 | |
GAMM+OK | 2173 | 1.06 | 16.04 | 18751 | 0.19 | 38.12 | |
ZINB | 1447 | 0.29 | 15.17* | 19811 | 0.25 | 36.40 | |
2018 | |||||||
OK | 2194 | 0.61 | 27.00 | 4258 | 0.47 | 33.38 | |
KED | 1857 | 0.52 | 29.53 | 4158 | 0.37 | 34.24 | |
NBGAM | 1416 | 1.47 | 23.98 | 5606 | 0.15† | 27.52 | |
NBGAM+OK | 1277 | 1.90 | 23.96 | 5372 | 0.31 | 27.47 | |
GAMM | 1848 | 1.56 | 23.50* | 5584 | 1.82 | 22.59* | |
GAMM+OK | 1693 | 4.54 | 31.76 | 7615 | 2.36 | 23.87 | |
ZINB | 1883 | 0.27† | 25.53 | 7394 | 0.17 | 23.00 |
Interpolation model: OK = ordinary kriging; KED = kriging with external drift; NBGAM = negative binomial generalized additive model; NBGAM+OK = NBGAM plus OK; GAMM = general additive mixed model; GAMM+OK = GAMM plus OK; ZINB = zero-inflated negative binomial model.
Model evaluation and comparative statistics: MAE = mean absolute error; RMSE = root mean square error.
#Least biased model.
Most precise model.#
Adult biomass estimated from spatial models varied approximately in a range equal to the minimum estimate for each year, with a CoV of 0.40 in 2007, 0.15 in 2015 and 0.24 in 2018. For the juvenile biomass, the CoV was 0.60 in 2007, and 0.18 in 2015 and 2018. This indicates that there were no great differences between the results of the models in terms of biomass estimation.
The design-based estimations tended to be close to the biomass estimated by spatial models (
Biomass estimation method and difference (%) | |||||
---|---|---|---|---|---|
Survey | Spatial* (t) | Stratified (t) | % diff. | Non-stratified (t) | % diff. |
Adults | |||||
2007 | 1783 | 1691 | -5 | 2555 | +43 |
2015 | 16775 | 15934 | -5 | 21144 | +26 |
2018 | 5584 | 4784 | -14 | 5981 | +7 |
Juveniles | |||||
2007 | 320 | 183 | -43 | 339 | +6 |
2015 | 1447 | 1919 | +33 | 2182 | +51 |
2018 | 1848 | 1911 | +3 | 1980 | +7 |
model returning the lowest RMSE.
The spatial distribution maps of the model predictions (
OK, KED, NBGAM, NBGAM+OK, GAMM, GAMM+OK and ZINB
OK, KED, NBGAM, NBGAM+OK, GAMM, GAMM+OK and ZINB
Given the socio-economic importance of sedentary fisheries and their high vulnerability to overexploitation because of their slow-moving habit, abundance indices must be well estimated and understood in order to conserve and manage stocks. The set of spatial analysis techniques now available has allowed for more reliable estimations of abundance indices and spatial population structure while reducing bias and increasing estimation accuracy to reduce the risk of errors, particularly overestimations. Errors in estimations can be substantial, as our results have shown, if the spatial component of the species distribution is not suitably accounted for. Failing this, there is an increased risk of fishery collapse or a decrease in the efficiency of the management and utilization of the resource as a result of poor decision-making (
Modelling the spatial component of exploited sedentary stocks may be critical to precise biomass estimations, maintaining reproductively viable population structures and avoiding the pitfalls of recruitment overfishing. In this study, the models that assumed a priori probability distribution or used covariates to model the spatial structure (NBGAM, NBGAM+OK, GAMM, GAMM+OK and ZINB) generally performed better than those that did not, such as OK. This contrast clearly shows the importance of closely considering the spatial component of the abundance index being estimated related to the species biology and exploitation. Queen conch, like many sedentary species, aggregate in specific habitats to carry out biological activities and are subject to high levels of exploitation which constantly alter their population structure (
From our results, the design methods returned biomass estimations that were sometimes very close to the best interpolation models (
In this study, the best models were generally the GAMM and the ZINB based on the two evaluation statistics, MAE and RMSE. It was, however, clear that model performances were strongly influenced by the structure of each dataset, including its sampling design, sample size, coefficient of variation and the quality of the available covariates used. As a result, the best model was not consistent for either juveniles or adults across the three surveys. This is consistent with theory and studies that suggest that spatial interpolation model performance is often data- and case-specific (
The inclusion of globally inputted environmental layers as covariates served to improve model performance but also likely added bias to the analysis. The inclusion of depth, habitat, and geographic location layers in the spatial analysis was justified based on the biology and ecology of the species, but these layers were also included because they were easy to obtain. The resolution of these layers, some of which were created for different purposes, did not necessarily reflect the scale at which ecological processes that affect queen conch distribution occur. As a result, undue bias may have been introduced, possibly affecting local autocorrelation and resulting in the false regression significance of some variables (
In the future, estimations could be further improved by obtaining geospatial layers of variables that reflect the resolution and scale of the biological processes of the species. Incorrect model calibration, unmodelled interactions and missing variables are also important sources of bias in spatial modelling (
Despite the improved access and availability of spatial interpolation techniques, there are fisheries such as that of the queen conch in which design methods of abundance indices estimation are widely used. Queen conch is a CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) Appendix II listed species, a conservation status which requires careful management if the fishery is to be sustained (
An ecosystem approach to fisheries (
Abundance isolines units are in number of conches per pixel. cr, coral reef; dc, deep coral; do, deep ocean; la, land; ma, macroalgae hardground; sa, sand and sediment; sg, seagrass.
The estimation of reliable indices of abundance for sedentary stocks such as queen conch requires that their underlying spatial population structure be accounted to include issues arising from the sampling design, excessive zeros and over-dispersion of variances. The multi-model comparative approach applied here to estimate queen conch biomass was shown to be robust but was dependent on the model specification to the specific case and dataset. The traditional use of design biomass estimation should be discontinued for queen conch, as there is the risk of stock collapse owing to poor management decisions based on estimation errors. Our results indicated differences in design method estimation as high as +51% of the biomass estimated by the best spatial interpolation model. Of the seven interpolation models, GAMM and ZINB were among the best-performing models for both juveniles and adults. Aside from the inherent assumptions of the interpolation models, model performance was largely influenced by the sampling design, the sample size, the coefficient of variation of the sample and the quality of the available set of covariates. Model performance could be further improved by including covariates that better represent the scale and resolution of the processes affecting the species distribution, including the effect of fishing. It is therefore clear that to ensure the reliability of abundance index estimation in an ecosystem management context, strong knowledge of the biological, environmental and exploitation-related factors must be understood and incorporated to address specific species and datasets.
This study was supported by data from the National Fisheries Authority of Jamaica (NFA), the National Environment and Planning Agency (NEPA), and The Nature Conservancy (TNC). Special thanks are due to Stephen Smikle and the survey staff from the National Fisheries Authority of Jamaica. The first author thanks the National Council of Science and Technology, Mexico (CONACYT) for the scholarship at the Universidad Marista de Mérida for a PhD degree. The authors would like to acknowledge the useful feedback and comments on the manuscript provided by two anonymous reviewers.