### INTRODUCTION

⌅ Sedentary species aggregate over space to carry out biological and
social functions resulting in a highly complex heterogeneous
distribution (Stoner and Appeldoorn 2021Stoner A, Appeldoorn R. 2021. Synthesis of Research on the Reproductive Biology of Queen Conch *(Aliger gigas)*: Toward the Goals of Sustainable Fisheries and Species Conservation. Rev. Fish. Sci. Aquac. 1-45. https://doi.org/10.1080/23308249.2021.1968789 ). This spatial structure of populations becomes even
more complex when these species are subject to exploitation. The
estimation of precise indices of abundance critical for stock assessment
must therefore consider this spatial structure (Rivoirard et al. 2000Rivoirard
J., Simmonds J., Foote K., Fernandes P., Bez N. 2000. Geostatistics for
Estimating Fish Abundance. Blackwell Science Ltd., Oxford, 216 pp. https://doi.org/10.1002/9780470757123 ). Previous authors have addressed the spatial
structure of sedentary species using methods based on assumptions
suitable for determining data sets (Surette et al. 2007Surette T., Marcotte D., Wade E. 2007. Predicting snow crab *(Chionoecetes opilio)* abundance using kriging with external drift with depth as a covariate. Can Tech Rep Fish Aquat Sci. 2763: 1488-5379, Drexler and Ainsworth 2013Drexler
M., Ainsworth C. 2013. Generalized Additive Models Used to Predict
Species Abundance in the Gulf of Mexico: An Ecosystem Modeling Tool.
PLoS ONE, 8: 5. https://doi.org/10.1371/journal.pone.0064458 , Lyashevska et al. 2016Lyashevska
O., Brus D., van der Meer L. 2016. Mapping species abundance by a
spatial zero-inflated Poisson model: a case study in the Wadden Sea, the
Netherlands. Ecol. Evol. 6: 532-543. https://doi.org/10.1002/ece3.1880 ). Three of the most commonly applied spatial methods are geostatistical models (Matheron 1963Matheron G. 1963. Principles of geostatistics. Econ. Geol. 58: 1246-1266. https://doi.org/10.2113/gsecongeo.58.8.1246 ), regression models (Lambert 1992Lambert D. 1992. Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 34: 1-14. https://doi.org/10.2307/1269547 , Hall 2000Hall D. 2000. Zero-inflated Poisson binomial regression with random effects: a case study. Biometrics, 56: 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x ) and hybrid models (Hengl et al. 2007Hengl
T., Heuvelink G., Rossiter D. 2007. About regression-kriging: From
equations to case studies. Comput. Geosci. 33: 1301-1315. https://doi.org/10.1016/j.cageo.2007.05.001 ). The multi-model spatial analysis approach is often
applied and has proven robust for modelling spatial distribution and
improving the estimation of indices of abundance (Li and Heap 2011Li
J., Heap A. 2011. A review of comparative studies of spatial
interpolation methods in environmental sciences: Performance and impact
factors. Ecol. Inform. 6: 228-241. https://doi.org/10.1016/j.ecoinf.2010.12.003 ). This is because specific species distribution
characteristics can be modelled and evaluated using various criteria to
determine the most reliable one for a particular dataset (Rufino et al. 2021Rufino
M., Albouy C., Brind’Amour A. 2021. Which spatial interpolators I
should use? A case study applying to marine species. Ecol. Model. 449:
109501. https://doi.org/10.1016/j.ecolmodel.2021.109501 ).

Geostatistical models are a group of spatial
analysis and interpolation methods based on the theory of the
regionalized variable (Matheron 1963Matheron G. 1963. Principles of geostatistics. Econ. Geol. 58: 1246-1266. https://doi.org/10.2113/gsecongeo.58.8.1246 ), among which the most common is kriging (Krige 1951Krige
D. 1951. A statistical approach to some basic mine valuation problems
on the Witwatersrand. Journal of the Chemical, J. South. Afr. Inst. Min.
Metall. 52: 119-139.). These have been applied to
estimate the abundance and biomass of various sedentary fisheries
species, including the snow crab *Chionoecetes opilio* (Surette et al. 2007Surette T., Marcotte D., Wade E. 2007. Predicting snow crab *(Chionoecetes opilio)* abundance using kriging with external drift with depth as a covariate. Can Tech Rep Fish Aquat Sci. 2763: 1488-5379), the sea cucumber *Isostichopus badionotus* (Hernández-Flores et al. 2015) and the red octopus *Octopus maya* (Avendaño et al. 2019). Generalized additive models (GAM) are
regression methods that can model non-linear response-explanatory
relationships without a priori assumptions about the probability
distribution of the data (Hastie and Tibshirani 1990Hastie T., Tibshirani R. 1990. Generalized Additive Models. Chapman and Hall, Washington D.C., 352 pp.). These methods have also been applied in fisheries spatial analysis assuming Poisson probability distribution (Lyashevska et al. 2016Lyashevska
O., Brus D., van der Meer L. 2016. Mapping species abundance by a
spatial zero-inflated Poisson model: a case study in the Wadden Sea, the
Netherlands. Ecol. Evol. 6: 532-543. https://doi.org/10.1002/ece3.1880 ), but for sedentary species the negative binomial probability may be more appropriate (Anderson and Seijo 2010Anderson L., Seijo J. 2010. Bioeconomics of Fisheries Management. Ames: Wiley-Blackwell. Oxford, UK. 305 pp.).
Zero-inflated models are another set of regression techniques with the
advantage of directly modelling the over-dispersion and excessive zeros
or zero inflation common to environmental data (Potts and Elith 2006Potts J., Elith J. 2006. Comparing species abundance models. Ecol. Model. 199: 153-163. https://doi.org/10.1016/j.ecolmodel.2006.05.025 ). These techniques have been used to estimate abundance indices in fisheries stock assessment (Potts and Rose 2018Potts
S., Rose K. 2018. Evaluation of GLM and GAM for estimating population
indices from fishery independent surveys. Fish. Res. 208: 167-178. https://doi.org/10.1016/j.fishres.2018.07.016 ). Each method can maintain the original spatial
structure of sample datasets without the need to transform or remove
outliers (Hastie and Tibshirani 1990, Martin et al. 2005Martin
T., Wintle B., Rhodes J., et al. 2005. Zero tolerance ecology:
improving ecological inference by modelling the source of zero
observations. Ecol. Lett. 8: 1235-1246. https://doi.org/10.1111/j.1461-0248.2005.00826.x , Webster and Oliver 2007Webster R., Oliver M. 2007. Geostatistics for Environmental Scientists, 2nd Edn. John Wiley and Sons, Ltd. Chichester, 336 pp. https://doi.org/10.1002/9780470517277 ). Transformation and removal of outliers of
environmental data may risk the loss of valuable ecological information
about the species (Arab et al. 2008Arab
A., Wildhaber M., Wikle C., Gent C. 2008. Zero-inflated modeling of
fish catch per unit area resulting from multiple gears: Application to
channel catfish and shovelnose sturgeon in the Missouri River. North.
Am. J. Fish. Manage. 28: 1044-1058. https://doi.org/10.1577/M06-250.1 ), leading to model miscalibrations and incorrect interpretations (Martin et al. 2005Martin
T., Wintle B., Rhodes J., et al. 2005. Zero tolerance ecology:
improving ecological inference by modelling the source of zero
observations. Ecol. Lett. 8: 1235-1246. https://doi.org/10.1111/j.1461-0248.2005.00826.x ).

In this study, we conducted a comprehensive
comparative spatial analysis using seven models: ordinary kriging (OK),
kriging with external drift (KED), negative binomial GAM (NBGAM), a
general additive mixed model (GAMM), a zero-inflated negative binomial
model (ZINB) and two hybrid regression kriging models, NBGAM+OK and
GAMM+OK. These particular models were chosen because they can deal with
highly skewed and zero-inflated data. They were applied to three surveys
of the gastropod queen conch, *Aliger gigas* (Linnaeus 1758), conducted in 2007, 2015 and 2018 on the Pedro Bank off the southern coast of Jamaica.

The queen conch is a highly exploited species on the Pedro Bank and
throughout its wider Caribbean range, particularly over the last forty
years (Prada et al. 2017Prada
M., Appeldoorn R., Van Eijs S., Pérez M. 2017. Conch Fisheries
Management and Conservation Plan. FAO Fisheries and Aquaculture
Tehcniacal Paper T610, Rome, 72 pp.). However, most queen
conch assessments apply design-based methods (Cochran 1977) which do not
consider the heterogeneous spatial structure of the species to
determine abundance indices (Ehrhardt and Valle-Esquivel 2008Ehrhardt N., Valle-Esquivel M. 2008. Conch *(Strombus gigas)* stock assessment manual. San Juan (PR): Caribbean Fisheries Management Council. 128 pp., Baker et al. 2016Baker
N., Appeldoorn R., Torres-Saavedra P. 2016. Fishery-independent surveys
of the queen conch stock in Western Puerto Rico, with an assessment of
historical trends and management effectiveness. Mar. Coast. Fish. 8:
567-579. https://doi.org/10.1080/19425120.2016.1223232 ). The fundamental issue with the design methods is
that its assumptions of spatial homogeneity, perfect mixing of ages and
sizes, and instantaneous redistribution after changes in abundance are
invalid for sedentary species (Anderson and Seijo 2010Anderson L., Seijo J. 2010. Bioeconomics of Fisheries Management. Ames: Wiley-Blackwell. Oxford, UK. 305 pp.). The failure of the design methods to account for spatial heterogeneity increases the risk of bias and estimation errors.

We aimed firstly to assess the performance of alternative spatial models in estimating biomass given the structure of the data. Secondly, we determined the implications of not accounting for the spatial population structure by comparing the spatial model estimations with two design methods, one with a stratified and the other with a non-stratified sampling design. The performance of the spatial interpolation models was evaluated using a ten-fold cross-validation residual analysis and visual examination of the resultant species distribution maps. The results of the analyses are discussed in terms of the main data issues presented by each survey data set, including sample design, zero-inflation and over-dispersion of variances, and how they affect the reliability of the estimations. The implications for spatial management of the species in light of its ecology are also discussed. All modelling procedures and the output thereof, including maps, were carried out and produced using packages available in the R program language (R Core Team 2021R Core Team. 2021. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/ ).

### METHOD

⌅#### Study site

⌅ The Pedro Bank is a shallow offshore bank located in the west-central
Caribbean approximately 80 km south of the main island of Jamaica (Fig. 1).
The bank has a total area of 8040 km² with maximum dimensions of 180 km
west to east and 79 km north to south down to the 30 m depth contour (Morris 2016Morris R. 2016. Distribution of Queen conch *(Strombus gigas)* on the Pedro Bank, Jamaica: descriptive and predictive distribution models. MS thesis, University of Iceland, 67 pp.).
The average depth is 24 m and the bottom consists predominantly of a
coralline sand substrate which makes up approximately two-thirds of the
bank. Infused throughout this primary substrate are patch coral reefs,
coral pavement (weathered coral reefs), various species of macroalgae
and gorgonians (Baldwin 2015Baldwin
K. 2015. Marine spatial planning for the Pedro Bank, Jamaica. Final
Report. For the Nature Conservancy and NEPA, Government of Jamaica.). This habitat complex accommodates a marine ecosystem that supports important fisheries for various finfish, spiny lobster (*Panulirus argus*) and queen conch (Baldwin 2015Baldwin
K. 2015. Marine spatial planning for the Pedro Bank, Jamaica. Final
Report. For the Nature Conservancy and NEPA, Government of Jamaica.). The central and southern areas of the bank are known to contain the highest densities of both juvenile and adult conch (Morris 2016Morris R. 2016. Distribution of Queen conch *(Strombus gigas)* on the Pedro Bank, Jamaica: descriptive and predictive distribution models. MS thesis, University of Iceland, 67 pp.).

#### Data

⌅ Distribution data were compiled from underwater visual surveys
conducted by the National Fisheries Authority of Jamaica (NFA) for the
years 2007, 2015 and 2018. The NFA has conducted such surveys at three
to five-year intervals since 1994, but only after 2007 did sampling
target the entire extent of the bank. The years 2007, 2015 and 2018 were
chosen to contrast sample design, sample sizes and spatial population
structures such as the size and location of high-density patches. The
2007 survey consisted of 60 sites clustered in the centre of the study
area, while the 2015 and 2018 surveys included 80 and 81 sites,
respectively, systematically distributed down to the 30-metre contour (Fig. 2).
Sample sites were defined as one to four belt transects with an area of
220 to 2250 m² spaced one metre from the end of one transect to the
start of the following one. The number and area of transects were
dependent on ocean conditions and diver safety considerations at the
time of sampling. Data recorded within each transect included geographic
location, depth, transect area, habitat type and numbers of adult and
juvenile conch. Juveniles were defined as individuals of shell length
lower than 20 cm and without a folded shell aperture or lip, while
adults were defined as having a shell length of 20 cm or greater and
having a thickened or folded lip (Appeldoorn 1988Appeldoorn R. 1988. Age determination, growth, mortality and age of the first reproduction in adult queen conch, *Strombus gigas* L., off Puerto Rico. Fish. Res. 6: 363-378. https://doi.org/10.1016/0165-7836(88)90005-7 ). In recognition of the different data assumptions of
the spatial models, calculated densities were used for the
geostatistical models OK and KED, whereas count data were used for the
other models.

#### Environmental layers

⌅ We collated broad-scale spatial environmental layers for geographic
location (latitude and longitude), habitat type and depth (bathymetry)
of the bank to be used as covariates in the interpolation models.
Geographic location was selected as a proxy for interactions between
queen conch location and ecological processes (Gutierrez et al. 2008Gutierrez N., Martinez, A. Defeo, O. 2008. Identifying environmental constraints at the edge of a species’ range: Scallop *Psychrochlamys patagonica* in the SW Atlantic Ocean. Mar. Ecol. Prog. Ser. 353: 147-156. https://doi.org/10.3354/meps07184 ). Juveniles and adults aggregate in specific habitats
for feeding, reproduction and predator avoidance, thus displaying
heterogeneous distribution with respect to habitat (Stoner and Ray-Culp 2000Stoner
A, Ray-Culp M. 2000. Evidence for Allee effects in an over-harvested
marine gastropod: density-dependent mating and egg production. Mar.
Ecol. Prog. Ser. 202: 297-302. https://doi.org/10.3354/meps202297 ). The habitat layer used was a generalized substrate
categorization created at 250 m² resolution and adjusted to 100 m² from
remote sensing and field measurements as part of the Pedro Bank Marine
Spatial Planning Project (Baldwin 2015Baldwin
K. 2015. Marine spatial planning for the Pedro Bank, Jamaica. Final
Report. For the Nature Conservancy and NEPA, Government of Jamaica.). Because the queen conch also displays stratification with depth (Morris 2016Morris R. 2016. Distribution of Queen conch *(Strombus gigas)* on the Pedro Bank, Jamaica: descriptive and predictive distribution models. MS thesis, University of Iceland, 67 pp.),
we extracted depth readings down to the 30 m contour, coinciding with
the maximum depth of the surveys, from the General Bathymetric Chart of
the Oceans Compilation Group grid bathymetric database (GEBCO 2020Gridded Global Bathymetry Data (GEBCO). 2020. British Oceanographic Data Centre, Liverpool, United Kingdom. https://doi.org/10.5285/a29c5465-b138-234d-e053-6c86abc040b9) ).

#### Interpolation surface

⌅ We created a 100 m² gridded interpolation surface down to the 30 m
depth contour consisting of 606000 pixels for a total area of 6060 km².
Because the sampling area for 2007 was smaller, the grid was adjusted to
204000 pixels or approximately 2040 km² to avoid extrapolation for that
dataset. Each environmental layer was converted to the same 100 m²
resolution so that the interpolation outputs were comparable. For the
interpolation process, each pixel was considered a discrete unit, so the
interpolated value in each pixel was multiplied by the ratio of the
total area of one pixel (1 km²) and the mean site area for each survey
year to compensate for the difference between the unit used for the
interpolation surface (pixel) and the area of the sampled sites (1 m²).
Total biomass within a pixel was calculated by transforming the
estimated abundances using queen conch weight conversion factors
corresponding to the 50% clean processing level (shell, viscera and
opercula removed) for adults (Aspra et al. 2009Aspra
B., Barnutty R., Mateo J., et al. 2009. Conversion factors for
processed queen conch to nominal weight. FAO Fisheries and Aquaculture
Circular No. 1042, Rome, 97 pp.) and 36.69 g per individual for juveniles (Appeldoorn and Rodriguez 1994Appeldoorn R., Rodriguez B. 1994. Queen conch, *Strombus gigas*, biology, fisheries and mariculture. Latinamerican Malacological Congress. Fundacion Cientifica Los Roques, Caracas, 356 pp.).

#### Design method biomass estimation

⌅We calculated two design-based (Cochran 1977) biomass estimations for comparison with the spatial interpolation methods: the first was a non-stratified method and the second one was stratified. The non-stratified method consisted of an estimate based on the product of the mean abundance and the total area and the weight conversion factors for adults and juveniles. Design method estimations can be significantly improved by stratification (Chang et al. 2017Chang J., Shank B., Hart D. 2017. A comparison of methods to estimate abundance and biomass from belt transect surveys: Population estimation from belt transect surveys. Limnol. Oceanogr. 15: 480-494. https://doi.org/10.1002/lom3.10174 ), so we introduced stratification based on depth (0-10, 11-20 and 21-30 m). The second design biomass was then calculated similar to the first method but for each stratum and then summed to obtain total biomass.

#### Geostatistical methods

⌅Ordinary kriging and KED were applied to the datasets using functions and procedures in the R gstat package (Pebesma 2004Pebesma E. 2004. Multivariable geostatistics in S: the gstat package. Comput. Geosci. 30: 683-691. https://doi.org/10.1016/j.cageo.2004.03.012 ). Geostatistical models use the variogram to estimate the spatial autocorrelation and covariance structures of the sample data, which are then used to determine kriging weights (Webster and Oliver 2007Webster R., Oliver M. 2007. Geostatistics for Environmental Scientists, 2nd Edn. John Wiley and Sons, Ltd. Chichester, 336 pp. https://doi.org/10.1002/9780470517277 ).

where $\gamma \left(h\right)$ is the semivariance at the distance interval $h$ , $Z\left({x}_{i}\right)$ is the number of conches at sample site ${x}_{i}$ and $N\left(h\right)$ is the number of observation pairs separated by distance $h$ .

Ordinary kriging (OK) assumes a stationary local mean around each sample and uses an unbiased linear combination of the surrounding weighted values produced by the variogram to estimate values at unsampled sites (Webster and Oliver 2007Webster R., Oliver M. 2007. Geostatistics for Environmental Scientists, 2nd Edn. John Wiley and Sons, Ltd. Chichester, 336 pp. https://doi.org/10.1002/9780470517277 ).

where
${Z}^{OK}$
is the estimated density of conches at the unsampled point of interest,
${x}_{0}$
,
$Z$
is the observed value at site,
${x}_{0}$
,
${\lambda}_{iOK}$
is the weight and is the value of samples within the search
neighbourhood used for the estimation. We defined the local
neighbourhood by designating a maximum distance over which semivariance
values are calculated and the minimum and maximum number of observation
points to be used in the kriging prediction. Defining a local
neighbourhood in this way reduces bias from outliers and extreme values
of highly skewed data (Surette et al. 2007Surette T., Marcotte D., Wade E. 2007. Predicting snow crab *(Chionoecetes opilio)* abundance using kriging with external drift with depth as a covariate. Can Tech Rep Fish Aquat Sci. 2763: 1488-5379).

KED assumes that the local trend or drift is a linear function of a secondary variable (Rivoirard et al. 2000Rivoirard J., Simmonds J., Foote K., Fernandes P., Bez N. 2000. Geostatistics for Estimating Fish Abundance. Blackwell Science Ltd., Oxford, 216 pp. https://doi.org/10.1002/9780470757123 ), which in this case was the depth covariate.

where $f\left(x\right)$ is the drift function of the depth variable and $a$ and $b$ are linear coefficients. The KED estimation ${Z}^{KED}\left({x}_{0}\right)$ is achieved by imposing two conditions: one where ${\sum}_{i=1}^{n}{\lambda}_{iKED}=1$ as with OK, and one where ${\sum}_{i=1}^{n}{\lambda}_{iKED}f\left({x}_{i}\right)=f\left({x}_{0}\right)$ is the secondary variable at each sample site ${x}_{i}$ and $f\left({x}_{0}\right)$ is the globally known secondary variable at points ${x}_{0}$ .

#### Generalized additive model and generalized additive mixed model

⌅A spatial version of GAM (Hastie and Tibshirani 1990Hastie T., Tibshirani R. 1990. Generalized Additive Models. Chapman and Hall, Washington D.C., 352 pp.) assuming a negative binomial probability distribution (NBGAM) was applied using the R mgcv package (Wood 2011Wood S. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Series. B. Stat. Methodol. 73: 3-36. https://doi.org/10.1111/j.1467-9868.2010.00749.x ), while the mixed model GAM or GAMM was applied using the R nlme package (Pinheiro et al. 2021Pinheiro J., Bates D., DebRoy S., et al. 2021. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-153, 3.1-153.) with procedures from Zuur et al. (2009Zuur A., Ieno E., Walker N., Saveliev A., Smith, G. 2009. Mixed Effects Models and Extensions in Ecology With R. Springer Science+Business Media LLC., New York, 574 pp. https://doi.org/10.1007/978-0-387-87458-6 ).

where ${Y}_{i}$ is the number of conches at site $i$ , is the intercept parameter, $s\left({x}_{i1\dots p}\right)$ are smoothing functions of the covariates and ${\epsilon}_{i}$ is the error term for the unexplained information. The starting model for each dataset was specified to include all available covariates (geographic location, depth and habitat). The best set of covariates was selected through a backward selection process using the Akaike information criterion (AIC) .

The GAMMs included optimal variance and correlation structures to model the random error term , heterogeneity of the variance and residual correlation in the data. Optimal variance and correlation structures were selected for the GAMM by minimizing the AIC among competing structures (Zuur et al. 2009Zuur A., Ieno E., Walker N., Saveliev A., Smith, G. 2009. Mixed Effects Models and Extensions in Ecology With R. Springer Science+Business Media LLC., New York, 574 pp. https://doi.org/10.1007/978-0-387-87458-6 ).

#### Regression kriging

⌅Two hybrid regression kriging (RK) models (Hengl et al. 2007Hengl T., Heuvelink G., Rossiter D. 2007. About regression-kriging: From equations to case studies. Comput. Geosci. 33: 1301-1315. https://doi.org/10.1016/j.cageo.2007.05.001 ), GAM+OK and GAMM+OK, were used. The RK prediction ${Z}^{RK}$ at the target location ${x}_{0}$ was reached by summing the trend estimated by the regression procedure $m\left({x}_{0}\right)$ and the kriged residuals using OK

#### Negative binomial zero-inflated model

⌅The spatial negative binominal zero-inflated model (ZINB) directly models the two sources of excessive zeros which can cause bias in spatially sampled data: sampling error and ecological processes (Martin et al. 2005Martin T., Wintle B., Rhodes J., et al. 2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol. Lett. 8: 1235-1246. https://doi.org/10.1111/j.1461-0248.2005.00826.x ). The procedure included modelling the non-zero part of the data assumed to be negative binomially distributed and the probability processes that generate zeros. The selection of covariates for both processes was done using backward selection among the set of explanatory variables (geographic location, depth and habitat) by minimizing the AIC. The analysis was implemented using the zeroinfl function of the R pscl package (Zeileis et al. 2008Zeileis A., Kleiber C., Jackman S. 2008. Regression Models for Count Data in R. J. Stat. Softw. 27: 1-25. https://doi.org/10.18637/jss.v027.i08 ).

#### Model evaluation

⌅Statistical evaluation and comparison of spatial model performance were done using ten-fold cross-validation. Two diagnostic statistics were calculated, the mean absolute error (MAE), a measure of prediction bias, and the root mean squared error (RMSE), which reflects prediction precision or prediction error (Willmott 1982Willmott C. 1982. Some comments on the evaluation of model performance. Bull. Am. Meteorol. Soc. 63: 1309-1313. https://doi.org/10.1175/1520-0477(1982)063<1309:SCOTEO>2.0.CO;2 ). A smaller MAE or RMSE value represents a better performing model.

where 𝑛 is the number of samples, ${p}_{i}$ is the predicted value at a location and ${o}_{i}$ is the observed value.

### RESULTS

⌅Structural data trends from the surveys included a high frequency of sites where zeros were recorded relative to sites with positive counts (Fig. 3). The proportion of zeros ranged from 14% to 36% and 24% to 48% for adult and juvenile datasets, respectively (Table 1). The data are also characterized by relatively high coefficients of variation (CoV) and degrees of right-skewness. These statistics are indicative of the patchy distribution, over-dispersion of variances, and zero inflation expected of typical ecological data (Martin et al. 2005Martin T., Wintle B., Rhodes J., et al. 2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecol. Lett. 8: 1235-1246. https://doi.org/10.1111/j.1461-0248.2005.00826.x ). The 2015 sample had the least amount of skewness and CoV, while the 2007 data had the highest values for both.

2007 | 2015 | 2018 | ||||
---|---|---|---|---|---|---|

Adult | Juvenile | Adult | Juvenile | Adult | Juvenile | |

Sample size | 60 | 60 | 80 | 80 | 81 | 81 |

Minimum | 0 | 0 | 0 | 0 | 0 | 0 |

Maximum | 350 | 95 | 165 | 85 | 250 | 181 |

Mean | 17 | 7 | 34 | 11 | 12 | 12 |

Median | 1 | 1 | 17 | 3 | 2 | 2 |

Variance | 2458 | 315 | 1387 | 290 | 958 | 795 |

Standard deviation | 49.6 | 17.8 | 37.3 | 17.0 | 31.0 | 28.2 |

Coefficient of variation | 3.0 | 2.6 | 1.1 | 1.6 | 2.7 | 2.4 |

Kurtosis | 32.1 | 11.7 | 0.9 | 5.9 | 41.0 | 19.9 |

Skewness | 5.4 | 3.4 | 1.2 | 2.4 | 5.9 | 4.2 |

Percentage of zeros | 33 | 48 | 14 | 24 | 36 | 36 |

#### Comparison of spatial interpolation models

⌅The performance of the spatial models reflected their underlying interpolation assumptions and the structural characteristics of the survey datasets. There was no clear correspondence between bias measured by the MAE and the prediction precision based on the RMSE. In other words, a model returning the lowest MAE did not necessarily return the lowest RMSE (Table 2). Only for the 2015 adult conch estimations did the model with the lowest MAE and RMSE relate to the same model. The best-performing models in 2007 and 2018 were generally those that included GAM or GAMM in full or in part through regression kriging (NBGAM, NBGAM+OK, GAMM and GAMM+OK). For 2015 the best models also included OK and ZINB. Specifically for the juvenile data, the best models were ZINB (MAE=0.13) and GAMM+OK (RMSE=14.46) in 2007, GAMM (MAE=0.06) and ZINB (RMSE=15.17) in 2015, and ZINB (MAE=0.27) and GAMM (RMSE=23.50) in 2018. For the adult data, the best models were NBGAM (MAE=0.57) and GAMM (RMSE=20.00) in 2007, OK (MAE=0.23, RMSE=33.60) in 2015, and NBGAM (MAE=0.15) and GAMM (RMSE=22.59) in 2018.

^{a}and corresponding ten-fold cross-validation diagnostics

^{b}. Bold: estimated biomass selected for adults and juveniles.

Juvenile | Adult | ||||||
---|---|---|---|---|---|---|---|

Year | Method | Biomass (t) | MAE | RMSE | Biomass (t) | MAE | RMSE |

2007 | |||||||

OK | 306 | 0.34 | 17.70 | 2171 | 0.57 | 56.68 | |

KED | 347 | 0.31 | 18.32 | 2484 | 2.78 | 61.38 | |

NBGAM | 63 | 0.45 | 15.89 | 1173 | 0.22† | 20.52 | |

NBGAM+OK | 47 | 0.92 | 15.70 | 1104 | 0.72 | 20.40 | |

GAMM | 242 | 0.25 | 14.87 | 1783 | 0.57 | 20.00* | |

GAMM+OK | 320 | 1.27 | 14.46* | 3535 | 88.08 | 208.72 | |

ZINB | 143 | 0.13† | 14.47 | 2645 | 5.02 | 48.77 | |

2015 | |||||||

OK | 2060 | 0.58 | 16.82 | 16775 | 0.04^{†} |
33.60* | |

KED | 1856 | 0.09 | 16.64 | 14928 | 0.20 | 35.83 | |

NBGAM | 1801 | 0.61 | 17.69 | 18184 | 0.81 | 38.38 | |

NBGAM+OK | 1783 | 0.16 | 17.64 | 17135 | 0.44 | 38.37 | |

GAMM | 2536 | 0.06† | 16.94 | 23515 | 0.06 | 38.11 | |

GAMM+OK | 2173 | 1.06 | 16.04 | 18751 | 0.19 | 38.12 | |

ZINB | 1447 | 0.29 | 15.17* | 19811 | 0.25 | 36.40 | |

2018 | |||||||

OK | 2194 | 0.61 | 27.00 | 4258 | 0.47 | 33.38 | |

KED | 1857 | 0.52 | 29.53 | 4158 | 0.37 | 34.24 | |

NBGAM | 1416 | 1.47 | 23.98 | 5606 | 0.15† | 27.52 | |

NBGAM+OK | 1277 | 1.90 | 23.96 | 5372 | 0.31 | 27.47 | |

GAMM | 1848 | 1.56 | 23.50* | 5584 | 1.82 | 22.59* | |

GAMM+OK | 1693 | 4.54 | 31.76 | 7615 | 2.36 | 23.87 | |

ZINB | 1883 | 0.27† | 25.53 | 7394 | 0.17 | 23.00 |

^{a} Interpolation model: OK = ordinary kriging; KED = kriging with external
drift; NBGAM = negative binomial generalized additive model; NBGAM+OK =
NBGAM plus OK; GAMM = general additive mixed model; GAMM+OK = GAMM plus
OK; ZINB = zero-inflated negative binomial model.^{b} Model evaluation and comparative statistics: MAE = mean absolute error; RMSE = root mean square error.^{†} #Least biased model.^{*} Most precise model.#

Adult biomass estimated from spatial models varied approximately in a range equal to the minimum estimate for each year, with a CoV of 0.40 in 2007, 0.15 in 2015 and 0.24 in 2018. For the juvenile biomass, the CoV was 0.60 in 2007, and 0.18 in 2015 and 2018. This indicates that there were no great differences between the results of the models in terms of biomass estimation.

#### Comparative analysis of spatial interpolation and design methods

⌅The design-based estimations tended to be close to the biomass estimated by spatial models (Tables 2 and 3), but a major limitation is that the simplicity of this biomass estimation is not enough to develop ecosystem-based management (Garcia et al. 2003Garcia S., Zerbi A., Aliaume C., et al. 2003. The ecosystem approach to fisheries. Issues, terminology, principles, institutional foundations, implementation and outlook. FAO Fisheries Technical Paper No. 443, Rome, 71 pp.). For adults, the percentage differences between the design methods and spatial interpolation biomass estimations ranged from -5% to -14% and +7% to +43% for the stratified and non-stratified methods, respectively. For juveniles, the differences ranged from +33% to -43% and +6% to +51%, respectively.

Biomass estimation method and difference (%) | |||||
---|---|---|---|---|---|

Survey | Spatial* (t) | Stratified (t) | % diff. | Non-stratified (t) | % diff. |

Adults | |||||

2007 | 1783 | 1691 | -5 | 2555 | +43 |

2015 | 16775 | 15934 | -5 | 21144 | +26 |

2018 | 5584 | 4784 | -14 | 5981 | +7 |

Juveniles | |||||

2007 | 320 | 183 | -43 | 339 | +6 |

2015 | 1447 | 1919 | +33 | 2182 | +51 |

2018 | 1848 | 1911 | +3 | 1980 | +7 |

*model returning the lowest RMSE.

#### Species distribution analysis

⌅ The spatial distribution maps of the model predictions (Figs 4 and 5)
highlighted the concentration of adult and juvenile conch biomass along
the southern edges of the bank, as has been reported by Morris (2016)Morris R. 2016. Distribution of Queen conch *(Strombus gigas)* on the Pedro Bank, Jamaica: descriptive and predictive distribution models. MS thesis, University of Iceland, 67 pp..
This area is characteristically shallower and contains a more complex
habitat structure, including coral reefs, seagrass, gorgonians and
macroalgae, as opposed to predominantly sand and algae in the other
areas. However, each model differed in its degree of local versus global
smoothing and identification of the local and broader-scale spatial
structures. For instance, the OK, KED and RK models better identified
local high-density patches while the NBGAM, GAMM, and ZINB models
identified larger-scale distributional trends largely reflecting the
environmental covariates. High-density patches of juveniles were
generally fewer and smaller in terms of abundance in comparison with
adults. They were also farther away from the southern edges of the bank
than the adults, suggesting a slight distributional difference. The
spatial distribution maps therefore validated the performance of the
interpolation model by identifying conch aggregations related to the
environmental covariates.

### DISCUSSION

⌅Given the socio-economic importance of sedentary fisheries and their high vulnerability to overexploitation because of their slow-moving habit, abundance indices must be well estimated and understood in order to conserve and manage stocks. The set of spatial analysis techniques now available has allowed for more reliable estimations of abundance indices and spatial population structure while reducing bias and increasing estimation accuracy to reduce the risk of errors, particularly overestimations. Errors in estimations can be substantial, as our results have shown, if the spatial component of the species distribution is not suitably accounted for. Failing this, there is an increased risk of fishery collapse or a decrease in the efficiency of the management and utilization of the resource as a result of poor decision-making (Anderson and Seijo 2010Anderson L., Seijo J. 2010. Bioeconomics of Fisheries Management. Ames: Wiley-Blackwell. Oxford, UK. 305 pp.). We have demonstrated the utility of the multi-model spatial analysis approach to estimate a common index of abundance (biomass) from highly skewed zero-inflated datasets by contrasting selected spatial interpolation techniques and then contrasting these with design methods which do not explicitly consider spatial structuring in the population.

#### Spatial model performance

⌅ Modelling the spatial component of exploited sedentary stocks may be
critical to precise biomass estimations, maintaining reproductively
viable population structures and avoiding the pitfalls of recruitment
overfishing. In this study, the models that assumed a priori probability
distribution or used covariates to model the spatial structure (NBGAM,
NBGAM+OK, GAMM, GAMM+OK and ZINB) generally performed better than those
that did not, such as OK. This contrast clearly shows the importance of
closely considering the spatial component of the abundance index being
estimated related to the species biology and exploitation. Queen conch,
like many sedentary species, aggregate in specific habitats to carry out
biological activities and are subject to high levels of exploitation
which constantly alter their population structure (Stoner and Appeldoorn 2021Stoner A, Appeldoorn R. 2021. Synthesis of Research on the Reproductive Biology of Queen Conch *(Aliger gigas)*: Toward the Goals of Sustainable Fisheries and Species Conservation. Rev. Fish. Sci. Aquac. 1-45. https://doi.org/10.1080/23308249.2021.1968789 ). While the weight of evidence has long favoured
model-based approaches over design-based ones, the set of models is
often generic in their calibration or is otherwise limited by the
availability of covariates (Li and Heap 2011Li
J., Heap A. 2011. A review of comparative studies of spatial
interpolation methods in environmental sciences: Performance and impact
factors. Ecol. Inform. 6: 228-241. https://doi.org/10.1016/j.ecoinf.2010.12.003 , Chang et al. 2017Chang
J., Shank B., Hart D. 2017. A comparison of methods to estimate
abundance and biomass from belt transect surveys: Population estimation
from belt transect surveys. Limnol. Oceanogr. 15: 480-494. https://doi.org/10.1002/lom3.10174 ). In our case, careful knowledge of the species
biology, level of exploitation and the study area were required to
select and specify the models that were used.

From our results, the design methods returned biomass estimations that were sometimes very close to the best interpolation models (Table 3). Despite this, the presence of unit stock assumptions and failure to account for the population spatial structure renders design methods inadequate for sedentary species (Anderson and Seijo 2010Anderson L., Seijo J. 2010. Bioeconomics of Fisheries Management. Ames: Wiley-Blackwell. Oxford, UK. 305 pp.), so they should not be applied to queen conch abundance index estimation. In other cases, the design methods returned estimations with large differences from the best interpolation models. From an ecosystem management standpoint, the risk of estimation errors is particularly problematic for density-dependent sedentary species such as the queen conch, in which allee effects can be enhanced by fishing leading to negative population growth and stock collapse (Stoner and Ray-Culp 2000Stoner A, Ray-Culp M. 2000. Evidence for Allee effects in an over-harvested marine gastropod: density-dependent mating and egg production. Mar. Ecol. Prog. Ser. 202: 297-302. https://doi.org/10.3354/meps202297 ). Chang et al. (2017Chang J., Shank B., Hart D. 2017. A comparison of methods to estimate abundance and biomass from belt transect surveys: Population estimation from belt transect surveys. Limnol. Oceanogr. 15: 480-494. https://doi.org/10.1002/lom3.10174 ) suggested that predictions from design methods may be comparable to those from statistical models if the study area is accurately stratified. Accurate stratification can, however, be difficult or subjective and may contain large bias due to the unmodelled spatial autocorrelation from the spatial structure (Segurado et al. 2006Segurado P., Araujo M., Kunin W. 2006. Consequences of spatial autocorrelation for niche-based models. J. Appl. Ecol. 43: 433-444. https://doi.org/10.1111/j.1365-2664.2006.01162.x ). In our case, we stratified according to ocean depth, which generally improved on the non-stratified estimate in terms of the relative difference from the best interpolation model. Other stratification criteria could have also been considered, such as habitat type or age class, but these were either impractical to apply or the data were not available.

In this study, the best models were generally the GAMM and the ZINB based on the two evaluation statistics, MAE and RMSE. It was, however, clear that model performances were strongly influenced by the structure of each dataset, including its sampling design, sample size, coefficient of variation and the quality of the available covariates used. As a result, the best model was not consistent for either juveniles or adults across the three surveys. This is consistent with theory and studies that suggest that spatial interpolation model performance is often data- and case-specific (Isaaks and Srivastava 1989Isaaks E., Srivastava, R. 1989. An intorduction to applied geostatistics. Oxford University Press, New York, 592 pp., Rufino et al. 2021Rufino M., Albouy C., Brind’Amour A. 2021. Which spatial interpolators I should use? A case study applying to marine species. Ecol. Model. 449: 109501. https://doi.org/10.1016/j.ecolmodel.2021.109501 ) and is reflected in the level of bias and prediction accuracy (Li and Heap 2011Li J., Heap A. 2011. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecol. Inform. 6: 228-241. https://doi.org/10.1016/j.ecoinf.2010.12.003 ). For the 2015 adults, for example, OK was the best model for both statistics but did not feature among the best for either of the two other years. Geostatistical methods, particularly OK, perform relatively poorly with clustered samples and high CoV (Webster and Oliver 2007Webster R., Oliver M. 2007. Geostatistics for Environmental Scientists, 2nd Edn. John Wiley and Sons, Ltd. Chichester, 336 pp. https://doi.org/10.1002/9780470517277 ). The clustered sampling of the 2007 data and the relatively high CoV for both the 2007 and 2015 datasets could explain the model performances in these years.

The inclusion of globally inputted environmental layers as covariates served to improve model performance but also likely added bias to the analysis. The inclusion of depth, habitat, and geographic location layers in the spatial analysis was justified based on the biology and ecology of the species, but these layers were also included because they were easy to obtain. The resolution of these layers, some of which were created for different purposes, did not necessarily reflect the scale at which ecological processes that affect queen conch distribution occur. As a result, undue bias may have been introduced, possibly affecting local autocorrelation and resulting in the false regression significance of some variables (Segurado et al. 2006Segurado P., Araujo M., Kunin W. 2006. Consequences of spatial autocorrelation for niche-based models. J. Appl. Ecol. 43: 433-444. https://doi.org/10.1111/j.1365-2664.2006.01162.x ). The effect of these biases is reflected in the output of the species distribution maps, on which some distributions appear to closely follow the trend of one or more of the covariates, so they poorly captured the local-scale spatial population structures. In addition, the significance of the covariates differed among the various models, as would be expected for models with different assumptions and specifications (Potts and Elith 2006Potts J., Elith J. 2006. Comparing species abundance models. Ecol. Model. 199: 153-163. https://doi.org/10.1016/j.ecolmodel.2006.05.025 ). Therefore, model performance, and by extension the reliability of the estimated abundance index, may have been affected.

In the future, estimations could be further improved by obtaining
geospatial layers of variables that reflect the resolution and scale of
the biological processes of the species. Incorrect model calibration,
unmodelled interactions and missing variables are also important sources
of bias in spatial modelling (Segurado et al. 2006Segurado
P., Araujo M., Kunin W. 2006. Consequences of spatial autocorrelation
for niche-based models. J. Appl. Ecol. 43: 433-444. https://doi.org/10.1111/j.1365-2664.2006.01162.x , Yu et al. 2013Yu
H., Jiao Y., Carstensen L. 2013. Performance comparison between spatial
interpolation and GLM/GAM in estimating relative abundance indices
through a simulation study. Fish. Res. 147: 186-195. https://doi.org/10.1016/j.fishres.2013.06.002 ). Given the level of density-dependence and high exploitation of the queen conch (Stoner and Ray-Culp 2000Stoner
A, Ray-Culp M. 2000. Evidence for Allee effects in an over-harvested
marine gastropod: density-dependent mating and egg production. Mar.
Ecol. Prog. Ser. 202: 297-302. https://doi.org/10.3354/meps202297 ), covariates explaining the impact of
density-dependent mechanisms (such as the allee effect) and fishing on
the species distribution may be important missing variables that could
have improved the models. The systematic targeting of high-density
patches of sedentary species is a key factor in modifying their spatial
distribution directly by removing individuals and indirectly by
affecting their habitat (Stoner et al. 2018Stoner A, Davis M., Kough A. 2018. Relationships between fishing pressure and stock structure in queen conch *(Lobatus gigas)* populations: synthesis of long-term surveys and evidence for overfishing in The Bahamas. Rev. Fish. Sci. Aquac. 26: 51-71. https://doi.org/10.1080/23308249.2018.1480008 ). The inclusion of these two covariates would be
prudent for future studies of this species and similar exploited
sedentary species using regression-type spatial models.

#### Fisheries management implications

⌅Despite the improved access and availability of spatial interpolation techniques, there are fisheries such as that of the queen conch in which design methods of abundance indices estimation are widely used. Queen conch is a CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) Appendix II listed species, a conservation status which requires careful management if the fishery is to be sustained (Prada et al. 2017Prada M., Appeldoorn R., Van Eijs S., Pérez M. 2017. Conch Fisheries Management and Conservation Plan. FAO Fisheries and Aquaculture Tehcniacal Paper T610, Rome, 72 pp.). Improving abundance index estimations and understanding the spatial structuring of juvenile and adult aggregations is therefore an important part of this management process and can be used to develop integrated ecosystem management.

An ecosystem approach to fisheries (Garcia et al. 2003Garcia
S., Zerbi A., Aliaume C., et al. 2003. The ecosystem approach to
fisheries. Issues, terminology, principles, institutional foundations,
implementation and outlook. FAO Fisheries Technical Paper No. 443, Rome,
71 pp.) implies explicit inclusion of the spatial aspect
of species management, including small and larger-scale spatial trends
caused by biological, ecological and environmental factors. The position
and extent of the aggregations realized by the distribution map
isolines are indicative of habitat preferences of juveniles and adult
spawning stocks that are critical for management (Fig. 6).
The changes in size and structure of these aggregations over time are
indicative of the distributional response of conch biomass to
environmental changes and the level of exploitation. High-density adult
aggregations from our models can also serve as a proxy indicator of
fisher behaviour since fishers are likely to target these areas. The
spatial structures produced from our analysis are therefore a means for
prioritizing areas for spatial planning measures such as zonation and
targeted research, for example, to maintain viable spawning stock
biomass above a reference point. This type of focused spatial management
can help to understand critical habitat requirements and maintain
population structure and genetic connectivity between local and regional
populations (Kitson et al. 2018Kitson-Walters K., Candy A., Truelove N., Roye M., Webber M., Aiken K., Box, S. 2018. Fine-scale population structure of *Lobatus gigas* in Jamaica’s exclusive economic zone considering hydrodynamic influences. Fish. Res. 199: 53-62. https://doi.org/10.1016/j.fishres.2017.11.010 ).

### CONCLUSION

⌅The estimation of reliable indices of abundance for sedentary stocks such as queen conch requires that their underlying spatial population structure be accounted to include issues arising from the sampling design, excessive zeros and over-dispersion of variances. The multi-model comparative approach applied here to estimate queen conch biomass was shown to be robust but was dependent on the model specification to the specific case and dataset. The traditional use of design biomass estimation should be discontinued for queen conch, as there is the risk of stock collapse owing to poor management decisions based on estimation errors. Our results indicated differences in design method estimation as high as +51% of the biomass estimated by the best spatial interpolation model. Of the seven interpolation models, GAMM and ZINB were among the best-performing models for both juveniles and adults. Aside from the inherent assumptions of the interpolation models, model performance was largely influenced by the sampling design, the sample size, the coefficient of variation of the sample and the quality of the available set of covariates. Model performance could be further improved by including covariates that better represent the scale and resolution of the processes affecting the species distribution, including the effect of fishing. It is therefore clear that to ensure the reliability of abundance index estimation in an ecosystem management context, strong knowledge of the biological, environmental and exploitation-related factors must be understood and incorporated to address specific species and datasets.