Effects of misrepresentative length samples on individual growth and stock condition estimates

.

scenarios, we compute the accuracy of vB individual growth parameters and how they subsequently affect the weight-length relationship, (maximum theoretical weight), f' and Kn. Our aim is to show the sensitivity of these parameters to the length sampling configuration and derive conclusions to inform sampling methods.

MATERIALS AND METHODS
We simulate growth data for the brown swimming crab, Callinectes bellicosus and compare estimated parameter values with those reported in the literature. In the simulations, individual growth follows a vB model with multiplicative random impacts i. We use the general model , where Li is observed length, L(ti) is median length-at age ti and i is the random term. Known, baseline parameter values are 190.44 mm carapace width (CW), and t0= -0.14 y (Villa-Diharce et al. 2021).
The vB model with a multiplicative error is where is the lengthat-age with log-mean = 0 and arbitrary log-sd = 0.10. An additive model results from logarithmic transformation of the model, , where , . The loglikelihood is (Burnham and Anderson 2002). The maximum likelihood estimators are the parameter values such that they maximize the loglikelihood, that is, . We used a stratified sampling scheme to generate random samples. Three length (mm) segments were considered: (26-80),  and ; these cover the length range of the brown swimming crab. Different sampling schemes could have been used, yet our aim was to obtain simulated samples of different, non-overlapping size segments to contrast results of our parameter estimates. Thus, the most parsimonious scheme in this case is stratified sampling. With these three segments we considered five sampling configurations (see below). We obtained pairs of Li and their corresponding i to obtain L(ti)=Li/ i. that satisfy the restriction L0< L(ti)< L∞; that is, L(ti) values smaller than 26 and greater than 190 mm were discarded. The age ti that corresponds to length Li was obtained by solving the vB model: Fifty pairs of Li and i were randomly drawn for each length segment to estimate vB growth and stock condition parameter values; this was repeated 1000 times using the Monte Carlo (MC) method (Janssen 2013). The means of the following parameters values were estimated: 1) vB growth equation, k, and t0; 2) growth performance index, f'; 3) weight-length relationship, a, b; 4) maximum theoretical weight, W ; and 5) condition factor, Kn=W0/ . Reference parameters a and b (and their variability) were also estimated in a previous work (Villa-Diharce et al. 2021).

INTRODUCTION
The importance of obtaining accurate estimates of individual growth parameters is reflected in a large amount of scientific literature related to fisheries, aquaculture and ecology (Brunel and Dickey-Collas 2010, Hutchinson and TenBrink 2011, Lee et al. 2020. Often, owing to the selectivity of fishing gear, samples do not represent the complete size structure in the population (Goodyear 1995, 2019, Kraak et al. 2019) even in data-rich scenarios (Frater and Stefansson 2020). This can be problematic given that individual growth influences estimates of mortality, fecundity, condition factor, growth performance, structure, dynamics and variability of stocks, food webs and ecological networks (Tsoukali et al. 2016, Stawitz and Essington 2018, N´Dri et al. 2020). More directly for management, growth parameters can also influence estimated abundance, yield and ecosystem-based management reference points (Parma and Deriso 1990, Jennings and Dulvy 2005, Cope and Punt 2009. The von Bertalanffy (vB) individual growth model has been used for diverse species of fishes, mammals, birds, and invertebrates (Lee et al. 2020). This model is often expressed as Lt (Pauly 1979).
is the asymptotic length, k is a constant representing catabolic stress referred to as the Brody growth coefficient (Hart and Chute 2009), and t0 is a theoretical age when length is zero. Despite the wide applicability of the vB model, it is often difficult to compare growth between different taxa (Brey 1999), and there have been several attempts to address this problem (e.g. the index of Gallucci and Quinn II 1979). A commonly used length-based index of growth performance is f' (Pauly and Munro 1984), which is a species-specific index used to compare reliability of vB parameters between and within species or stocks (Etim et al. 1999, Moura et al. 2017. The growth performance parameter f' has found widespread use in comparing integral performance of vB growth curves (Quaas and Skonhoft 2022, Rodríguez-Castañeda et al. 2022, Şimşek et al. 2022. In addition to growth parameters and growth performance, the condition factor (Kn) is an important index in fisheries biology and allows inferences about the fitness of an individual in a population (N'Dri et al. 2020). Kn can be expressed as , where is the weight estimated with the length-weight relationship and W0 is observed weight; this expression is known as the relative condition factor (Le Cren 1951). Kn can be used to compare the status of conspecific organisms or the status between species, sexes and sizes and in different seasons of the year or between years. Individuals are considered to be in relatively good condition when Kn is greater than 1 and in poor condition if Kn is lower than 1 (Jisr et al. 2018).
In the present study we explore how simulated length structure in samples affects estimates of key stock condition parameters, with an illustrative example. We consider a sample of lengths to be biased if it consistently over-or underrepresents the entire stock size structure. Under various sampling Samples from each length segment were drawn using the following five configurations (Table 1).
Except for Kn, all mean MC-estimated parameter values were compared with the original values in terms of their relative biases, mean squared error and standard errors. To analyse the values, in equation Kn=W0/ we substitute for the observed weight W0 the expression used for its simulation. We have , where a and b are the observed values of the parameters of the weight-length model. In this equation we assumed a multiplicative error term with lognormal distribution. Substituting terms, we have Kn . For each length configuration, random lengths of uniform distributions (within the smallest and largest length) were generated. Weight was then obtained considering multiplicative or lognormal errors with log-mean=0 and cv= 10% (or sd=0.101). This expression shows the influence of discrepancies between a and and between b and on the value of Kn. To better represent this influence, we tabulated magnitudes for the different sampling configurations. When a and b take their real values, then , , and . For each sampling configuration, 50 Kn values were obtained as follows. With the MC-originated lengths, an observed mean weight W0 was computed using the observed parameters of the weight-length relationship (see below). For each mean length, weight was obtained using the MC parameters a and b obtained for samples from the three length segments. Kn values were then plotted against their corresponding length to observe the behaviour of Kn depending on the sampling scheme. To better understand the behaviour of Kn, we conducted a closer analysis of the differences of the true and MC-estimated a and b values of the weight-length relationship.
As mentioned, the vB parameter values used as baselines were those obtained by Villa-Diharce et al. (2021). Using maximum likelihood, we estimated parameters of the weight-length model (Haddon 2011) for combined sexes; after observing the dispersion of data we assumed a multiplicative error term (e.g. Curiel-Bernal et al. 2021). The statistical model is with lognormally distributed with logmean zero and log-standard deviation σ. We log-transformed the model and obtained an additive model , where , . The loglikelihood function is (Burnham and Anderson 2002) . The estimators , and of parameters a, b and σ, respectively, are those that maximize the loglikelihood function, that is . To numerically maximize the loglikelihood we used the function nlminb( ) written in R (R Core Team 2021). The significance of b in the weight-length equation was tested using a t-test (Pauly 1984).
Using known values of the vB parameters and of the weight-length model, we estimated the maximum loglikelihood values of and , (Pauly and Munro 1984) and (Haddon 2011).
We compared the quality of a parameter estimator H using the mean squared difference between the estimator H and the parameter , i.e. the mean square error (MSE) of parameter H. The MSE can be divided into variance and bias (Casella and Berger 1990): . For a sequence H1, H2, ... Hn of estimates of a parameter , one can obtain the terms of the previous relationship as . We then took the square root of these quantities so that they are expressed in the same scale of the estimated parameter: square root of the mean error , standard error and Bias . To further appreciate the magnitudes, we took their value relative to the magnitude of the parameter to be estimated (Lehmann and Casella 1998, Dekking et al. 2005, Wang et al. 2021). These relative values were then tabulated with columns referring to the segment sampled and rows showing the relative quantities estimated, as a percentage of the original values.
Finally, with each set of vB parameter values estimated sampling the five configurations of lengths, a plot was generated and compared with the baseline curve using the best estimates (Villa-Diharce et al. 2021). This provided an integrated insight into biases and errors that can be committed with incomplete length sampling.

RESULTS
The weight-length relationship estimated was W=0.000017939 L 3.349 , b being significantly greater than 3 (p<0.001) (CI: 3.2923, 3.4059). Using the mentioned vB parameters, we estimated the maximum weight The baseline value of phi prime was f'= 190.44 = 4.58. The mean, relative bias (RB), relative standard error (RSE) and relative root mean square error (RMSE) of the parameters estimated under the tested sampling schemes are shown, respectively, in Tables 2 to 5. The following section highlights the most relevant information contained in these five tables.
We obtained the minimum value of when sampling only from the smallest length segment (26-80 mm) and the nearest to the baseline value using extreme lengths, 26-80 and 135-190 (Table 2). When sampling the whole range of lengths (Table 3), the smallest plus largest, and the largest lengths yielded relatively small bias values (-0.24 to -3.38 mm, 26-80 and 135-190 and 135-190, respectively). Sampling the smallest length range produced the highest bias, followed by sampling the central range. Small values of the RSE (Table 4), from 3.22 to 3.78%, were found for symmetric sampling (whole range, and smallest and largest). When sampling only the largest range, the variability of more than doubled. The RMSE's two components (bias and RSE) behaved similarly ( Table 5).
Estimates of parameter k showed small biases when samples came from segments 26-80 and 135-190 mm (5.89 and 4.32%) ( Table 3). The smallest bias occurred when samples came from the largest length segment. In general, when only one segment was sampled, the RB was always large. The size of the biases when sampling combined and separated segments was also observed in the RSE values (Table 4).
For t 0 the largest bias was negative when samples came from the largest length segment, 135-190 mm (-459.66%). Relatively small biases resulted when samples came from symmetric combinations of segments; when sampling separate segments, biases were larger (Table 3). The RSE (variability) values showed the same described pattern: lower values when sampling symmetric combinations of length segments, and notably larger values when sampling separate segments. Most of the magnitude of RMSE was due to variability of RSE.
Estimates of growth performance f' were stable and small, changing from positive to negative when samples came from single length segments. When sampling came from combined segments, the global quality index, RMSE, took very small positive values (Table 5).
For the weight-length (W-L) relationship, the bias of coefficient a was smaller when samples came from segments containing the whole range of lengths, followed by combined smallest and largest (Table 3). The RSE behaved similarly (Table 4): an increasing RMSE of coefficient a resulted when samples came from individual segments ( Table 5). Estimates of the exponent b were stable, as can be seen both in the RB values and in the RSE values (Tables 3 and 4). This stability was also observed in the values of RSE and RMSE (Tables 4 and 5).
The behaviour of estimated was similar to that of : both had smaller biases and standard errors in configurations with extreme lengths (1 and 2) and in the larger lengths (last configuration).
showed a greater variability than because of the variability of both and the scale parameter a of the W-L relationship. The value of parameter b did not influence variation of because of its stability (Tables 2-5). Figure 1 shows the scatter plots of 50 Kn values resulting from lengths simulated in the five configurations. The upper-left panel is considered as reference, i.e. when there is no bias in the estimates of parameters a and b. As observed, the Kn values were relatively well estimated, except when samples came from the largest length range. In this case, Kn values were underestimated.  Table 6 shows the estimated means of parameters a and b for the five sampling configurations, as well as their comparisons with the true values of the parameters. The first two configurations yielded similar estimates of the true values of a and b.
In the first two sampling configurations, the ratios were close to one and the differences were almost zero. This is why the plots of Kn for the first two configurations in Figure 1 are very similar to those presented in the reference plot (upper-left). Sampling from the central and larger length segments resulted in monotonic growth in the estimates of a, and therefore monotonic decrease in the ratios , which ranged from 0.929 to 0.854. A comprehensive visual analysis of the relative performance of the five sampling schemes can be observed by comparing the growth plots obtained using estimated parameters and plots using the baseline values (Fig. 2). The least biased plot resulted when samples contained the whole range of lengths (configuration 1) and with the smallest and largest length segments (configuration 2). Poor fits resulted when samples came from configurations 3, 4 and 5; the worst fit resulted when samples came only from the smallest length range.

DISCUSSION
Sampling different size segments of a stock or population will influence the estimates of individual growth, length-weight parameters, maximum theoretical individual weight and two widely used measures of growth efficiency: growth performance index f' and condition factor Kn. Using three relative measures of accuracy (bias, standard error and mean square error) and an a priori segmentation of length, we obtained results with practical applications. The standard deviation (sd) of 0.1 used here covers +/-20% of possible values around the mean; however, due to chance, some values could lay outside such boundaries. Accuracy of parameter estimates can vary when different sd values are used, and this issue merits further research. The accuracy of parameter estimates does not necessarily depend on sampling the entire length-age range possible, and the error is not the same for all parameters. Our results provide useful guidance to develop sampling schemes in the common case when time and resources are scarce. We caution that erred estimates of basic parameters, particularly von Bertalanffy individual growth, can lead to wrong values of other key parameters used in fisheries management, for example, M natural mortality rate (see Maunder et al. 2023 for a recent review of various methods to estimate M).
To estimate the accuracy of parameters in our simulations we used three relative indices based on length-stratified age samples. Our main purpose in this paper was to simulate and compare samples representing a balanced number of biased length samples to analyse possible effects in estimated values of parameters of general interest. Much larger sample sizes can and are often obtained; in our case, however, we struggled to provide insights for the common case of data-poor fisheries or limited resources for sampling. Other works (Xiao 1996, Perreault et al. 2020) used relative root mean squared error and RB of simulated and true values. In our case, we also used the RSE ( Barbaro et al. 1981, Vølstad et al. 2011, another measure of accuracy of the parameter estimates as a function of the sampling configurations we tested. This statistic evenly distributes the deviations between sampling configurations. Incomplete length sampling may be caused by selectivity of fishing gear and when catch is graded at sea or upon landing before samples can be taken. Natural behaviour of individuals may also cause misrepresentation of length structures in samples. For the crab Ranina ranina, for example, juveniles and adults spend considerable time buried and segregate by life history stage. Gear used for commercial fishing seldom catches juveniles, which can produce erred estimates of vB growth parameters (Kirkwood et al. 2005).
Failure to produce robust estimates of growth parameters will inevitably curtail our ability to conduct good stock assessments to inform management (Gwinn et al. 2010). For example, given that fecundity generally increases with individual size (Marshall et al. 2019), to maximize catch from a cohort, a yield-per-recruit analysis is often performed (Beverton and Holt 1957, Die et al. 1988, Zhai and Pauly 2019, which depends on individual growth parameters. In the present work we sought parameter values that were closest to the true values that represented correct growth trajectories (Pardo et al. 2013). Previous simulation studies (e.g. Wilson et al. 2015) recommend combining samples from fishing gears with different selectivity to improve growth parameter estimates. It has been also proposed that to maximize accuracy of individual growth parameters, a complete representation of organisms from different sizes is needed, with two conditions: 1) evenly distributed sample sizes across age/size segments, and 2) sample sizes as large as possible (Quinn and Deriso 1999, Pilling et al. 2002, Shelton and Mangel 2012. Our simulation results indicate that more subtle characteristics underly the final estimated values of growth parameters. A key element is to a priori take into consideration the configuration of possible size segments sampled (e.g. Table 1). Using simulations, Goodyear (1995) concluded that reliable estimates of mean size-at age require random sampling of lengths within ages, and that stratifying samples by length biased the estimates of mean length-at-age. It is not clear if samples were generated by splitting age into equal or different sizes. Goodyear (2019) sim-ulated samples using two strategies relevant for the present work: samples stratified by age and by length; size-stratification produced biased estimates of lengthat-age and vB parameters. In this case, age strata were of one year, and length strata were constant.
In general, for the vB growth model, it was found that underrepresented small/large individuals yield small-biased k/large-biased estimates, respectively (Taylor et al. 2005). Because of the inverse relationship between k and (Gubiani et al. 2012), biases of these two parameters vary in opposite directions. Also, stability of f' with respect to the sampling segments results from the inverse correlation between the base-10 logarithms of estimated parameters and k (Pauly 1998). If one parameter decreases, the other increases, so the product of terms that define f' remains stable.
Useful estimates of f' requires sound sampling that, whenever possible, accounts for seasonal or annual variations in length composition of stocks (Mathews and Samuel 1990).
Estimated Kn depends on values of the weightlength function. In the present work, because the differences are very close to zero and their contribution to Kn is through an exponential function, their effect is practically negligible. Hence, the values of Kn are merely a reflection of the values taken by the ratios . In practical terms, symmetric sampling configurations that include extreme length values result in practically unbiased estimates of parameters a and b (cf. Fig. 1). Samples with the largest individuals increased bias in coefficient a. This is in turn reflected in the scatter plots of Kn. For parameter b, when samples came from separate segments, RSE values were very similar to the RMSE, which means that the variability component is greater than the bias.
Our simulations showed that erroneous vB parameter estimates result when most or all individuals in the samples are of similar lengths. For practical purposes, when the vB model fits the data and sampling resources are scarce, it is convenient to actively include the smallest and largest individuals in a sample (segments 1 and 3). If researchers are interested in estimating growth performance f', it would be advisable to sample from the entire size range available using different fishing gears or sampling methods. These considerations are intended to guide sampling schemes and minimize erred estimates of growth and growth efficiency arising from lack of appropriate data.