Quality of fisheries data and uncertainty in stock assessment *

A fisheries management system usually includes many components (Fig. 1). Large errors in any of these components may result in mis-management of a fisheries stock, resulting in either over-exploitation of fisheries resources or unnecessary economic loss or social hardship for the coastal communities that depend upon fisheries (Hilborn and Walters, 1992; Walters, 1998). The impacts of errors occurring in some components of the management system (Fig. 1) have been evaluated in many studies (Hilborn and Walters, 1992; McAllister and Kirkwood, 1998; NRC, 1997, 1999). Of particular interest in this study, however, are the impacts of measurement errors in fisheries data on stock assessment and fisheries management. Measurement errors, which may greatly influence the quality of fisheries data, can originate from different sources with different statistical properties. Errors resulting from directly measuring a fisheries variable (e.g. length and weight) or from a welldesigned subsampling programme are probably random and small. However, in some cases, errors associated with fisheries data can be non-random and biased. An example of this is catch statistics in a quota-managed fisheries system. Fishermen may try to maximise their profits for a given quota by high grading, a practice of discarding less valuable or desirable catch (usually small fish) while keeping more valuable or desirable catch (usually large fish). In this case, only landed catch is included in catch statistics while discarded catch, although also part of the fishing mortality, is excluded from the catch statistics (Hilborn and Walters, 1992). Thus, the total catch is under-estimated. Because small fish are SCI. MAR., 67 (Suppl. 1): 75-87 SCIENTIA MARINA 2003


INTRODUCTION
A fisheries management system usually includes many components (Fig. 1).Large errors in any of these components may result in mis-management of a fisheries stock, resulting in either over-exploitation of fisheries resources or unnecessary economic loss or social hardship for the coastal communities that depend upon fisheries (Hilborn and Walters, 1992;Walters, 1998).The impacts of errors occurring in some components of the management system (Fig. 1) have been evaluated in many studies (Hilborn and Walters, 1992;McAllister and Kirkwood, 1998;NRC, 1997NRC, , 1999)).Of particular interest in this study, however, are the impacts of measurement errors in fisheries data on stock assessment and fisheries management.
Measurement errors, which may greatly influence the quality of fisheries data, can originate from different sources with different statistical properties.Errors resulting from directly measuring a fisheries variable (e.g.length and weight) or from a welldesigned subsampling programme are probably random and small.However, in some cases, errors associated with fisheries data can be non-random and biased.An example of this is catch statistics in a quota-managed fisheries system.Fishermen may try to maximise their profits for a given quota by high grading, a practice of discarding less valuable or desirable catch (usually small fish) while keeping more valuable or desirable catch (usually large fish).In this case, only landed catch is included in catch statistics while discarded catch, although also part of the fishing mortality, is excluded from the catch statistics (Hilborn and Walters, 1992).Thus, the total catch is under-estimated.Because small fish are The quality of fisheries data has great impacts on the quality of stock assessment, and thus fisheries management.In this paper, using a case study I evaluate the impacts of two types of error, biased error and atypical error, that can negatively affect the quality of fisheries data in stock assessment.These errors are commonly associated with fisheries data, and assumptions on their sources and statistical properties can have great impacts on the outcome of stock assessment.
Although the sources and statistical properties of these errors differed, both of them could result in errors in stock assessment if estimation methods are not appropriate.Different statistical approaches used in fitting models differ in their robustness with respect to errors of different statistical properties in data.This study showed the importance of evaluating the quality of input data and the possibility of developing an approach that is robust to errors in data.Considering the likelihood of fisheries data being affected by errors of different statistical properties, I suggest that the robustness of a stock assessment be evaluated with respect to data quality.more likely to be discarded, the estimation of length/age composition for catch may also be biased (Pikitch, 1987) In addition to these biased errors, atypical measurement errors may also occur in fisheries data.The phrase "atypical error" in this study refers to the error only occurring in a small number of years, the statistical properties of the error being significantly different from errors occurring in most of the other years (Chen et al., 2000).For example, data collected in the first few years of applying a new management strategy to a fisheries stock may be subject to errors different from those of data of later years when a data collection system is established and problems in the data collection are identified and remedied.Some abnormal events may significantly increase or decrease the magnitude of measurement errors in a year, resulting in atypical errors (e.g. a change in sampling protocols).The data subject to atypical errors may become outliers, which can have large impacts on fisheries stock assessment modelling (Chen et al., 2000;Hinrichsen, 2001).
Errors in data are often assumed to have certain statistical properties (e.g.random, independent, normal etc.) when an objective function for stock assessment modelling is formulated (Megney, 1989).The existence of non-random and atypical errors in data violates the assumption made concerning the statistical properties of the measurement errors in data.Because parameter estimation methods used in stock assessment are often sensitive to the violation of the error assumptions (Schnute, 1989;Chen et al., 2000), the existence of non-random and/or atypical errors may lead to substantial biases in stock assessment modelling, and subsequently to mis-management of fisheries resources.Thus, it is important to understand how the non-random and atypical errors may affect parameter estimation and how we can reduce the likelihood of negative impacts of these errors on stock assessment.
In this paper, using a case study as an example I evaluate the impacts of two types of errors, biased and atypical errors which are directly related to the quality of fisheries data, on stock assessment.The robustness of different statistical approaches used in stock assessment modelling is evaluated with respect to these errors.

Fishery and data availability
For simplicity in model structures, I chose a fishery with catch and catch-per-unit-effort (CPUE) data available, which calls for the use of a production model.The fishery used as an example in this study is the eastern rock lobster (Jasus verreauxi) fishery on the coast of New South Wales (NSW), Australia.
Rock lobsters have been fished off the NSW coast since the late nineteenth century.Since the 1994-95 fishing year (i.e. from July 1 1994 to June 30 1995), this fishery has been managed under an output-control scheme with an annual total allowable catch (TAC) of 106 t.Abundance indices, CPUE, were developed for the period 1903-36 and that from 1969-70 to 1996-97 (Fig. 2; Montgomery, 1995;Montgomery et al., 1997).

Production model
A production model requiring a time series of catch and CPUE as input data was used to describe the biomass dynamics of the rock lobster.This model can be written as where B t and B t+1 are the biomasses of the stock at the beginning of years t and t+1, C t is the total catch during the year t, and G t is the intrinsic growth of stock biomass during year t (Hilborn and Walters, 1992).A Schaefer or logistic type model was modified to relate G t to the average stock biomass in year t, rather than to the biomass at the beginning of year t, ( where r is the rate of intrinsic growth of the stock biomass and K is the carrying capacity or biomass of the virgin stock.Replacing G t in Equation 1 with Equation 2 and solving the derived polynomial function, we can calculate B t+1 as (3) where , and .Thus, the biomass at the beginning of a year can be estimated from the biomass at the beginning of the previous year if parameters r and K are known.
Because B cannot be observed directly from the fishery, an observational model is needed to relate B to an abundance index (I) that can be measured in the fishery.In this study, I t is the CPUE observed in the commercial rock lobster fishery year t.The stock biomass is assumed to be proportional to I (Hilborn and Walters, 1992).Thus, the observational model can be written as where q is the catchability coefficient and error term ε T,t ∈ N(0, σ I

2
).The unit of CPUE and the source of data were not the same for each period for the NSW rock lobster fishery.CPUE was measured as kg/vessel and estimated from annual reports of NSW Fisheries for the period from 1903 to 1957-58; as kg/(trap-month) estimated from fishers' catch cards for the period of 1969-70 to 1983-84; and as kg/(trap-month) estimat- ed from the LCATCH database of NSW Fisheries for the period from 1984-85 to 1996-97 (Montgomery, 1995).To incorporate CPUEs from these different sources into the observational model, three different q's corresponding to the three different series of the CPUE were used.They were referred to as q 1 (from 1903 to 1957-58), q 2 (from 1969-70 to 1983-84), and q 3 (from 1984-85 to 1996-97).Thus, the model parameters to be estimated are B 1884 , r, K, q 1 , q 2 , and q 3 .Because year 1884 was in the early stage of the development of the rock lobster in NSW, it would be reasonable to assume that the stock biomass at the beginning of year 1884 (B 1884 ) was approximately the same as the exploitable virgin biomass K.This reduced the number of the parameters to be estimated to five, i.e. β = (r, K, q 1 , q 2 , and q 3 ).

Data quality
The catch data collected for the period from 1969-70 to 1996-97 were thought to underestimate the true commercial catch because of the underreporting and black market (Montgomery, 1995).Recently, it was believed that the CPUE data for the same time period might also be underestimated due to the underestimation of the catch.The proportion of the commercial catch that was not reported (A t ) was estimated for the time period after 1969-70 based on a survey of fishermen in the rock lobster fishery (Fig. 3; Montgomery et al., 1997).The recreational catch was estimated as a proportion of the commercial catch.This proportion factor, referred to as RA t , was estimated from a survey of recreational fishermen (Fig. 3; Montgomery et al., 1997).Thus, for the period from 1969-70 to 1996-97, the total catch and CPUE in year t were adjusted as , ( 5) and . ( The data estimated after the adjustment for under-reporting and recreational catch are believed to better represent the true mortality resulting from fishing.The unadjusted catch and CPUE data apparently severely under-estimated the total catch and CPUE for the period from 1969-70 to 1996-97 (Fig. 2).To evaluate the impacts of such biases in catch and CPUE on stock assessment, I simulated the following four data sets in this study: (1) both catch and CPUE were adjusted; (2) catch was adjusted, but CPUE was not adjusted; (3) CPUE was adjusted, but catch was not adjusted; and (4) neither catch nor CPUE was adjusted (Table 1).Stock assessment was conducted using each set of these data as inputs and the results in parameter estimations were compared among the four data sets.
To study the impacts of outliers on stock assessment, two sets of CPUE data were simulated from adjusted catch-CPUE data (i.e.set I; Table 1) using the following procedures: (1) fitting the production model and observational model described above to adjusted catch-CPUE data (i.e.set I) using the nonlinear LS method (Polacheck et al., 1993); (2) using the estimated parameters to calculate predict- Reported catch in year 1 1 ed CPUE using equations ( 3) and (4); and (3) adding predicted CPUE with randomly and normally distributed errors (CV=25%).This simulated a set of CPUE data, which had no apparent outliers (Fig. 4a).Using this set of CPUE data, a second set of CPUE data was simulated by greatly altering values of 6 observations in the first period of CPUE data (1903 to 1957-58) and values of 3 observations in the second period (1969-70 to 1996-97; Fig. 4b).These two sets of CPUE data, referred to as data without outliers and data with outliers respectively, were used in the study of impacts of outliers on stock assessment, together with the adjusted catch data.

Statistical approach to estimating parameters
Two Bayesian approaches were used to estimate model parameters.The first one is the commonly used Bayesian inference based on log normal distribution of observation error in Equation ( 4) in formulating a likelihood function (McAllister and Kirkwood, 1998;Walters, 1998).The second approach is a robust Bayesian method incorporating a two-component mixture likelihood function which is robust to outliers (Chen and Fournier, 1999).A detailed description of this approach can be found in Chen and Fournier (1999).
For the rock lobster fishery, the likelihood function for the commonly used Bayesian approach can be written as where i is an index for the three periods, t 1 is from 1903-1957-58, t 2 is from 1969-70 to1983-84, t 3 is from 1984-85 to 1996-97, and β is a parameter vector.For each time period σ ˆwas estimated as where T 1 and T 2 are the start and end of the years and n is the number of year for each time period.This method is referred to as the "normal method" (NM) in this study.
The likelihood function for the robust Bayesian method for the rock lobster can be written as where α is the proportion of data subject to atypical errors.The value of α was set at 0.05, reflecting the belief that only a small proportion of data was subject to atypical errors (Chen and Fournier, 1999).The Bayesian estimation based on this approach is referred to as the "robust method" (RM) in this study.
Priors for all parameters except r were assumed to have uniform distributions, K = B 1884 ⊂ U(1, 50000), q 1 ⊂ U(10 -6 , 0.3), q 2 ⊂ U(10 -6 , 0.01), and q 3 ⊂ U(10 -6 , 0.01).Parameter r was assumed to follow the normal distribution r ⊂ N(0.07, 0.1 2 ) with a lower and upper boundary of 0.001 and 0.4 respectively.The choice of lower and upper boundaries assumed for these parameters was based on the rock lobster biology, previous stock assessment experience of this fishery, and similar species (e.g.New Zealand rock lobster; Polacheck et al., 1993;Montgomery, 1995;Montgomery et al., 1997;Chen and Montgomery, 1999).For example, q is catchability coefficient defined as the proportion of stock biomass that can be removed by one unit of fishing effort.For q 1 , it is assumed to be impossible that one vessel (unit of fishing effort for that time period) can catch 30% of stock biomass in any given year, which leads us to set the upper boundary of 0.3.
The Sampling-Importance-Resampling (SIR) algorithm (Rubin, 1988) was used to estimate posterior distributions for the model parameters (Smith and Gelfand, 1992).The posterior distributions were estimated for the NM and RM respectively.For each set of data defined in Table 1, the NM and RM methods were applied to estimate posterior distributions of parameters.These parameters included five model parameters β = (r, K, q 1 , q 2 , and q 3 ) and three fisheries parameters, current stock biomass (B 1997-98 ), stock depletion described as the ratio of B 1997-98 over K, and maximum sustainable yield (MSY) calculated as rK/4 (Ricker, 1975).Posterior distributions estimated for biased data sets (i.e.sets II, III, and IV; Table 1) were compared with those estimated for the unbiased data set (i.e.set I; Table 1) using the NM and RM respectively.For the posterior distribution of each parameter estimated using either NM or RM, a difference index was calculated as (8) where p I (i) is the i th interval of posterior distribution of a parameter for data set I (Table 1) and m is the total number of intervals used in plotting posterior distributions.A large value of this index indicates a large departure of posterior distributions resulting from using biased data (Chen et al., 2000).
To evaluate impacts of outliers on stock assessment and robustness of different methods with respect to outliers, both the NM and RM were applied to data with and without outliers.I evaluated how outliers may affect estimating posterior distributions using the NM and RM by calculating the following index (9) where m is the number of intervals of a posterior distribution, P N,i is the relative frequency of the i th interval of posterior distribution estimated using the NM method for data without outliers, and p i is the relative frequency of the i th interval of posterior distribution estimated using the RM or NM for data with or without outliers (Chen et al., 2000).It is expected that the RM would yield a small CIndex value for data both with and without outliers because of its robustness to outliers, while the NM would yield a large CIndex for data with outliers.

RESULTS
When the NM was applied to the adjusted catch-CPUE data (set 1; Table 1), the derived posterior distributions were right skewed for all parameters (Fig. 5) with mean values larger than the corresponding median values (Table 2).The means, medians, and 90% credibility intervals of the posterior distributions of all parameters changed little when the unadjusted CPUE and adjusted catch data (set II; Table 1) were used (Table 2).In this case, the posterior distributions were almost identical to those of data set I FISHERIES DATA QUALITY 81 FIG. 5. -Posterior distributions of key fisheries parameters estimated using the normal method (NM) for the four data sets defined in Table 1.
for all parameters (Fig. 5).Differences in the summary statistics from those for data set I increased when the unadjusted catch and adjusted CPUE data (set III) were used in the estimation (Table 2).The modes of posterior distributions estimated using data set III shifted from those estimated using data set I for all parameters except the 3 catchability coefficients (Fig. 5).Similar results were observed between data sets I and IV (Table 2 and Fig. 5).Indices of differences in posterior distributions CI calculated using Equation ( 8) were much smaller for data set II than for data sets III and IV, which had similar indices of differences (Table 3).This suggested that posterior distributions estimated using the NM were more sensitive to biases in catch than biases in CPUE data.The impacts of using biased data on estimating posterior distributions differed among different parameters; the three q's and K were least affected while growth rate r, MSY, and status of exploitation being most affected in data 82 Y. CHEN sets III and IV (Table 3).All parameters except q 1 had their posterior distributions rather different from their priors (Fig. 5), an indication that the information from data was informative.Similar results were observed in comparing posterior distributions of data set I with those of data sets II to IV when the RM was applied (Tables 2 and  3, Fig. 6), suggesting that the impacts of systematic biases in catch-CPUE data on the estimation of posterior distributions were similar between the NM and RM.When data had no biases, a large difference in posterior distributions as described by DI between the NM and RM was observed for three catchability coefficients, while parameters K and r had the smallest DI (Table 4).The status of the rock lobster stock estimated according to the RM was more optimistic than that of the NM.However, the differences were small.For the NM, carrying capacity K increased slightly while intrinsic growth rate r decreased quickly after biases were introduced in catch and/or FISHERIES DATA QUALITY 83 FIG. 6. -Posterior distributions of key fisheries parameters estimated using the robust method (RM) for the four data sets defined in Table 1.CPUE data.Thus, according to the NM, with the increase of data biases the stock was changed to a stock with a larger virgin biomass but a smaller growth rate.However, for the RM, both the carrying capacity K and intrinsic growth rate r decreased with the increased biases in data (Table 2).Both the mean and the median of r estimated using the RM were higher than those estimated using the NM.For all the model parameters (i.e.K, r, q 1 , q 2 , and q 3 ) estimated using the RM, DI measuring differences in posterior distributions between non-biased data and biased data (Equation 8) were smaller than those for parameters estimated using the NM (Table 3).However, the differences between two methods were rather small.The DI's for current stock biomass and MSY estimated using the RM, however, tended to have larger values than those estimated using the NM (Table 3).This indicates that the RM is less sensitive to biased errors in estimating model parameters while the NM is less sensitive to biased errors in estimating management parameters.68.0 q 2 39.8 q 3 25.0FIG. 7. -Posterior distributions of key fisheries parameters estimated using normal and robust methods for data with and without outliers.
For data without outliers, virgin biomass K estimated using the NM had a right-skewed posterior distribution (Fig. 7).The mode of the distribution was about 5500 tons.The posterior distribution for K estimated using the RM was similar to that estimated using the NM (Fig. 7), suggesting small differences in posterior distributions estimated using the NM and RM in the absence of outliers in data.
The variations of the posterior distribution of K estimated using the NM were similar for data with and without outliers.However, the locations of the two posterior distributions differed greatly (Fig. 7).The mode of the posterior distribution of K estimated using the NM for data with outliers was 7000 tons, 1500 tons higher than that estimated using the NM in the absence of outliers.Thus, the posterior distribution of K shifted substantially to the right after the inclusion of outliers with the use of the NM (Fig. 7).This suggests that a posterior distribution estimated with the NM is sensitive to outliers.The posterior distributions of K estimated using the RM for data with or without outliers were virtually identical and similar to that estimated using the NM in the absence of outliers (Fig. 7).This suggests that the posterior distribution of K estimated using the RM is not sensitive to outliers.Similar conclusions could be drawn by comparing the differences in posterior distributions estimated for other four parameters (i.e.intrinsic growth rate r, current stock biomass B 1997 , MSY, and depletion) considered in this study (Fig. 7) for the NM and RM.Outliers in data tended to result in a more optimistic evaluation of the fishery when the NM was used, with a lower level of depletion and high level of maximum sustainable yield.Impacts of outliers on the RM-estimated posterior distributions were minimal (Fig. 7).

DISCUSSION
The errors induced by large biased errors in catch and CPUE data were surprisingly small.This may result from influence of priors and existence of a large number of unbiased estimations of catch and CPUE data collected in the early years of fishery.This indicates the importance of the quantity of fisheries data.With a large set of data, even if some data are subject to biased errors, the results will probably be less affected.This study also suggests the importance of good prior knowledge on key fisheries parameters in Bayesian stock assessment.It confirms the previous finding that an informative and well-defined prior distribution for key parameters can reduce the impacts of errors in data (Chen and Fournier, 1999).This shows an advantage of the Bayesian approach if good information on key parameters can be obtained from other independent ecological or biological studies.Such information can reduce the negative impacts of various errors commonly associated with fisheries data on stock assessment.
The posterior distributions for q 1 estimated using NM were similar to the uniform distribution assumed for its prior (Fig. 5), and probability was not zero close to the upper boundary (i.e.0.3).This may suggest a need to increase the upper boundary value, and data is not informative in deriving the posterior distribution for this parameter.A sensitive analysis was run with upper boundary value of q 1 being 0.6, and no great changes were observed.Because q is defined as the proportion of stock biomass that can be removed by one unit of fishing effort, 0.3 was thought to be a reasonable upper boundary value, which implies our belief that one fishing vessel (unit of fishing effort) will not be able to remove 30% of stock biomass.
Errors associated with different data were shown to have different impacts on stock assessment modelling in this study.Stock assessment modelling was found to be less affected by non-random errors in CPUE than those in catch although the magnitudes and nature of the errors in the two types of data were both determined by the RA t in equation ( 5).It would be interesting to evaluate differences in the impacts of errors of CPUE and catch data on stock assessment modelling with other fisheries that have different temporal patterns of CPUE and catch and different types of errors to those of this fishery.Such a study may provide us with knowledge on the relative importance of the quality of catch and CPUE data.Given that CPUE data are likely to have larger errors than catch data (Hilborn and Walters 1992), such studies may help us identify priorities in allocating sampling efforts.
The magnitudes of errors resulting from non-random errors in data differed in estimating different parameters with different estimation methods.This study suggests that the model parameters (K, r, and q's) estimated using the RM tended to have smaller errors than those estimated using the NM, while MSY and current stock biomass calculated from combinations of model parameters estimated using the NM had smaller errors than those estimated using the RM.Such a comparison study may help us identify an optimal method for estimating parameters of interest.For the NM method, the estimate of model parameter K becomes smaller due to non-random errors in data, which results in an increase in model parameter r because of negative correlation between these two parameters estimated using the NM (Hilborn and Walters, 1992).Because management parameters are often calculated from two or more of these model parameters (e.g.MSY = 0.5rK), the negative bias in r and the positive bias in K (or positive bias in r and negative bias in K) result in a small bias in estimating management parameters.For the RM method, because the estimates of model parameters (i.e.K and r) are not necessarily correlated (Chen and Montgomery, 1999), a positive bias in K resulting from biased data is less likely to lead to a negative bias in r or vice-versa than those estimated using the NM.This may explain why, when the RM was used for parameter estimation, the biased data tended to result in small errors in the model parameters, but large errors in the management parameters.This study suggests that the impacts of the quality of fisheries data on parameter estimation differ among different parameters for a given estimation method and differ between different estimation methods for a given parameter.Thus, it is important to identify an appropriate estimation method in estimating parameters of interest.A comparative study among different estimation methods with some realistic error structures for data is likely to be of help in identifying a suitable method for estimating certain parameters of interest.
Large biases may arise in estimating vital fisheries stock/management parameters if the NM is used in formulating the likelihood function for data with outliers.The use of the RM methods can substantially reduce the bias caused by the existence of outliers in data.Different outliers exist in data, some arising due to measurement errors and some arising due to abnormal environmental variations or other process errors.For the first type of outliers, using the robust method can effectively reduce their impacts on stock assessment.For the second type of outliers, however, many people believe they reflect the true variations in fisheries and should be included in the estimation of uncertainty associated with fisheries stock/management parameters.In this case, if a robust method is used, such a variation is not included in the estimation of uncertainty because the robust method effectively reduces the impact of the outliers on parameter estimation.It is thus important to identify the nature of an outlier, but before this can be done, we need to identify whether there are outliers in the data.
Because a robust method and a normal-distribution-based method tend to yield similar results in the absence of outliers, we may be able to identify whether there are outliers by comparing the differences in parameter estimates derived using a robust method and using a normal-distribution-based method.If there is a small difference, we may conclude that there are no outliers.If large differences are observed, we may conclude that outliers exist.In this case, the next step should be to examine all observations carefully to see which observations may be outliers, and with the help of background information regarding how the data are collected and environmental variations (e.g.some rare events) we may then identify the nature of the outliers.This approach to identifying possible outliers is qualitative.To be more precise in identifying outliers, a quantitative approach that defines quantitative criteria for outlier identification needs to be developed (Rousseeuw and Leroy, 1987).
This study demonstrates the importance of evaluating the quality of input data and identifying and applying an approach that is insensitive to the quality of fisheries data in stock assessment.Considering the likelihood of fisheries data that may be affected by errors of different natures, I suggest that the robustness of a stock assessment be evaluated with respect to data quality in fisheries stock assessment.

TABLE 1 .
-Catch and catch-per-unit-effort (CPUE) data sets used in the study.
FIG. 4. -Simulated catch-per-unit-effort (CPUE) data without (top panel) and with (bottom panel) outliers.The normal random errors were the same for both sets of data.

TABLE 2 .
-Summary statistics of posterior distributions for key parameters.Bcur is current stock biomass; Ratio is calculated as Bcur/K, indicating stock depletion level; and MSY is maximum sustainable yield.

TABLE 4 .
-Comparison indices (CIndex as defined in equation 9) in posterior distributions of parameters estimated for data set I using the normal and robust methods.