Incorporating time-series structure in medium-term recruitment projections *

Medium-term projections of age-structured population dynamics are the basis for many strategic management decisions about fisheries within the purview of ICES (International Council for the Exploration of the Sea). Such projections generally take the form of stochastic simulations of stock performance over a future 10-year time period under a range of levels of imposed fishing mortality F, and are used to determine levels of F that are likely to minimise the risk of stock collapse. The exploitation pressure on most commercially-important species managed via ICES’ advice is such that few fish survive for longer than five years, and projections are therefore largely driven by the assumed recruitment model and the initial estimated size of the stock. The prevailing methodology is to fit a parametric curve to the stock-recruitment scatterplot arising from historical assessment, take random samples from the time-series of residuals to this curve, and use these samples to determine recruitment values for projections. The purpose of this paper is to highlight the principal statistical deficiency in this approach, correct it, and compare the new results with the old for the North Sea cod, haddock and whiting populations.

the context of ICES stock assessments.These use a parsimonious algorithm which is intended to explore the likely response of an assessed population to fixed rates of fishing and natural mortality, given initial population-at-age estimates and an assumed parametric stock-recruitment model.Each WGMTERM analysis consists of a large number of simulation runs (generally 1000), each projecting the development of the stock over a number of consecutive years (typically, 10).The vector of initial population sizes at the start of each run can be drawn from the distribution of possible sizes determined by the standard deviations of historical population estimates, although this option was not used in the following analyses as the aim was to compare the effects of different recruitment models without the complication of variability in starting estimates.The underlying model governing stock size is a standard Baranov (1918) age-structured model, in which exponential population decay is determined by the fixed rates of fishing mortality F i,Y and natural mortality M i,Y .Thus the population-at-age i in year Y is given by where Z i,Y = F i,Y + M i,Y measures the total mortality due to fishing and natural causes.A deterministic value of recruitment for each year in each simulation run is given by the fitted stock-recruitment model.To this is added a residual drawn at random (or bootstrapped) from the time-series of residuals to the fitted stock-recruitment curve.The principal output metrics are percentiles of the projected spawningstock biomass (SSB, denoted here by S Y ), where Here, for a specific year Y, W i,Y are weights-atage, Mat i,Y are proportions mature-at-age, and A is the maximum age allowed for in the population.The main statistical problem with this standard approach is that there is likely to be some degree of autoregressive or moving-average structure in the timeseries of recruitment residuals, and simple random sampling of residuals does not account for this.This paper describes new modifications to medium-term methodology to accommodate this issue.

Time-series data
Data on parental spawning-stock biomass S and recruitment R for North Sea cod, haddock and whit-ing were taken from the relevant stock summaries in ICES (2000).Stock-recruitment pairs post-dating the 1996 year-class were excluded from the analyses: the VPA-derived recruitment estimate for 1997 is likely to have been poorly defined due to lack of convergence in the VPA method, whilst those for 1998 and 1999 were RCT3 estimates (Shepherd, 1997) which are thought to be uncertain.Two analyses were performed for haddock, firstly using the full data series, and secondly using data from 1968 onwards, thus removing the unusually large 1967 year-class estimate which is thought to exert a degree of leverage on any fitted stock-recruitment model that is not commensurate with its relevance for current fisheries management.

Stock-recruitment modelling
Scatterplots of R and S are shown in Figure 1.Parameters for stock-recruitment models, namely those due to Ricker (1954), Beverton and Holt (1957) and Shepherd (1982), were estimated using non-linear least-squares regression assuming a lognormal error distribution.In addition, parameters for the Saila-Lorda or gamma model R = αS Y e -βS were fitted to ascertain the presence of statisticallysignificant depensation: that is, decreasing recruitsper-spawner with decreasing stock size (Saila and Lorda, 1980;Iles, 1994;Quinn II and Deriso, 1999).Two-tailed t-tests showed that in no case was the fitted depensation parameter significantly different to 1.0, and the Saila-Lorda model was not used for this reason (cod: p = 0.350; full series haddock: p = 0.428; reduced series haddock: p = 0.171; whiting: p = 0.148).
The use of an assumed lognormal error distribution leads to a lognormal bias of exp(-σ 2 /2) in the expected value of the fitted curve, where and Nelder, 1983).Hence in analyses where it is the expectation or mean of the curve that is of interest, the estimates of R obtained from the curve must be inflated by a bias-correction term of exp(σ 2 /2).However, stochastic projection of recruitment produces percentiles of the distribution of projected values: this does not involve the estimation of an expected value and therefore does not call for bias-correction.
Parameter estimates and a goodness-of-fit statistic, R 2 adjusted for the number of model parameters, were noted for each model fit (Table 1).While such diagnostics are useful in selecting between similar models (for example, assessing whether the addi- 1 1 e tional parameter in the Shepherd model is justified), equal attention must be paid to the biological plausibility of the chosen model.Cod and whiting are thought to be cannibalistic on their young, leading to stock-dependent density effects which can be encapsulated by the Ricker and Shepherd models but not by the Beverton-Holt model.Ongoing biological process studies suggest that non-stock-dependent density effects at the settlement stage may be crucial in determining year-class strength for haddock (M.Heath, FRS Marine Laboratory, pers.comm.), implying that the Beverton-Holt model (which also has the better diagnostics) should be chosen for both haddock analyses.A further consideration is that the projection performance of the chosen model should be plausible.The projections considered here all start near the lower limit of the historically-observed range of S and it would be expected that the stockrecruitment trajectory would tend towards the origin at current high levels of fishing mortality.For many stocks, a formulation such as the Beverton-Holt model will have a steep slope at the origin and consequently high R until close to the origin, and may thus not simulate the expected decline in S as well as over-compensatory curves such as the Shepherd model, which will often be nearly linear in this -Recruitment models fitted to stock-recruitment data for North Sea cod (Ricker, 1963(Ricker, -1996)), haddock (Beverton-Holt, 1963-1996and 1968-1996), and whiting (Ricker, 1960(Ricker, -1996)).The solid lines give the least-squares fits on a lognormal scale.Recruitment is at age 1 for cod and whiting, age 0 for haddock.
TABLE 1. -Goodness-of-fit diagnostics for parametric model fits to stock-recruitment data for North Sea cod (1963-1996), haddock (1963-1996 and 1968-1996), and whiting (1960-1996).The measure listed is R 2 adjusted for the number of parameters in the model.A high positive value indicates a good fit; a high negative value indicates a poor fit.The selected model for each stock is highlighted in bold text.region.In the event all the models fitted here for cod and whiting (except the depensatory Saila-Lorda model) looked very similar at low S, while the Ricker model appeared to be more linear in this region for haddock.The whole question of the systematic selection of recruitment models on the basis of desirable properties they may have for population projections and subsequent management decisions is one that must be pursued further.

Species
The model-selection considerations for these analyses can be summarised as follows.For cod and whiting, all models (except Saila-Lorda) had a similar shape near the origin.The Beverton-Holt model was ruled out because it does not account for likely cannibalism, and of the remaining models, the Ricker had the better adjusted R 2 value and was used for that reason.For both full-period and reduced-period haddock, the Shepherd model had a very steep slope at the origin and was therefore unlikely to have plausible projection characteristics.The Ricker model was slightly more linear at low S than the Beverton-Holt model, but the latter was chosen on the basis of underlying biology and better adjusted R 2 .
For analyses using the standard ICES Working Group bootstrapped-residual approach, the Ricker and Beverton-Holt formulations used are , and where a is age at recruitment and r i = R i / Ri is a randomly-selected residual ratio from the relevant model fit to historical stock-recruit data.When this method is modified by the addition of ARMA timeseries modelling the equations used are , and where x i is a random variable giving rise to stochastic fluctuations in projections.

Time-series model fitting
Time-series models for this study were fitted to x i = ln(R / R), the logarithm of the ratio of observed to fitted recruitments, based on the Ricker or Beverton-Holt stock-recruitment models.The reasons for this were two-fold: to reduce the significance of outliers, and to incorporate the possibility of lower R at low S in projections.Model fitting was carried out using the S-PLUS statistical package (MathSoft Inc., 1999), following and expanding upon methodology outlined in Box and Jenkins (1976) and Venables and Ripley (1999).
ARMA time-series modelling is an attempt to describe the behaviour of a data series in terms of a combination of autoregressive (AR) and movingaverage (MA) effects.In order to conform to requirements for stationarity, it may also be necessary in general to difference the series.For the analyses described here, however, this was not the case: because the series in question are residuals from a fitted parametric model, they tend to fluctuate around a stationary mean of zero without any requirement for further intervention.If a mean value is not being fitted, an ARMA(p,q) model fitted to a series x i is given by where p and q are the order of the AR and MA components of the model respectively, Φ i and Θ i are AR and MA parameters to be estimated, and -Values of AIC adjusted for small-sample size (AICe: Hurvich and Tsai, 1989) resulting from ARMA(p,q) model fits to time-series of ln(R / R), the logged ratio of observed to fitted recruitment for a Ricker (cod and whiting) or Beverton-Holt (haddock) curve.The selecte d models are highlighted in bold text.nc = ARMA fit procedure did not converge.
The customary initial exploratory analysis for ARMA modelling consists of plots of the autocorrelation function (ACF) and partial autocorrelation function (PACF) for the original series x i .However, given the relatively restricted length of the stockrecruitment series examined here, it was not possible to ascertain much evidence of autoregressive or moving-average time-series dynamics from these plots, and there was little noticeable pattern in the ACF and PACF.Because of this, and to ensure that well-fitting models did not get overlooked, we systematically estimated ARMA model fits for all combinations of AR and MA orders from zero up to three.The maximal order of the AR and MA terms was chosen to ensure that only parsimonious models are considered.
Resulting values of Akaike's Information Criterion (AIC: Akaike, 1973) as adjusted for smallsample size (AICe: Hurvich and Tsai, 1989) are shown in Table 2, while ranked order plots for AIC and AICe are given in Figure 2.For cod and whiting, there is a clear distinction between the two best-fitting models (first-order autoregressive and first-order moving-average) and the rest: for these stocks the former was used, principally because of the duality between autoregressive and movingaverage processes and a priori empirical considerations (c.f.Chatfield, 1989;Powell and Steele, 1995) that make a Markov model more appealing.The second-order moving-average model was a considerably closer fit to the data than any others for the full-period haddock stock, and was used for this reason.There was little to choose between first-order moving-average and first-order autoregressive models for reduced-period haddock: in this case, the former was used to ensure consistency with the full-period haddock analysis.Clearly these choices are arbitrary to a certain extent and subsequent results are therefore largely illustrative.A theoretical comparison of the various time-series models used here warrants attention but will not be considered further in this paper.

TIME-SERIES MODELS IN RECRUITMENT PROJECTIONS 205
FIG. 2. -AIC (open points) and AICe (closed points) diagnostics for ARMA time-series model fits to log residual ratios ln(R / R) for North Sea cod (1963-1996, a), haddock (1963-1996, b and 1968-1996, c), and whiting (1960-1996, d).The values of p and q for each model are given next to the relevant diagnostic points.

Time-series projection methodology
For each simulation run, a vector A = [a i ] of innovations was created by random draws from a normal distribution with mean 0 and variance given by the variance of the ARMA model fit, so that a i ∼ N(0,σ 2 ARMA ).North Sea cod and whiting recruit at age 1, so the first value of the projected time-series vector X = [x i ] was given by the logged residual ratio for the final historical assessment year.For North Sea haddock, which recruit at age 0, the starting point for X was given by a value of recruitment generated stochastically using the estimated S for the first year and the fitted stock-recruitment model.Subsequent values of X were generated from this point using the innovations vector.For an autoregressive ARMA(1,0) model with parameter φ 1 the ith projection value is while the equivalent for a moving-average ARMA(0,1) model with parameter θ 1 is The projected Ricker recruitment for cod and whiting (recruiting at age 1) is then , while the Beverton-Holt model for haddock (recruitng at age 0) is Once recruitment is calculated, population dynamics are processed as for the standard bootstrapped-residual approach.(1963-1996), haddock (1963-1996 and 1968-1996), and whiting (1960-1996).1000 simulation runs were performed for 10 years each using two different projection methods (WGMTERM and ARMA).Plotted percentiles are 5% (dotted), 25% (dashed), 50% (solid), 75% (dashed), and 95% (dotted).Recruitment values estimated by the relevant ICES Working Group (ICES, 2000) are shown as points.

Projections
and W used in the projections were the means of the final 3 years (1994)(1995)(1996) of the historical assessment.Plots of percentiles of projected recruitment and SSB are given in Figures 3 and 4, along with point estimates produced by the 1999 Working Group (ICES 2000) based on final-year VPA results and RCT3 recruitment estimates.Projections for cod do not appear to be sensitive to the choice of projection model.The projected spread of recruitment for full-period haddock is somewhat narrower when using the ARMA approach, with a concomitant tightening of SSB profiles.This effect is repeated to a lesser extent for reduced-period haddock and whit-ing.Median SSB levels are similar in all cases.Point estimates for R and S lie within or very close to the 90% confidence intervals of all models, save for those for cod recruitment in 1998 and 1999 which however were the two smallest recruiting year-classes in that entire time-series.
Table 3 gives values of AICe, fitted parameter estimates, t-test values for parameter significance, and the overall variance of each model fit.It is notable that several parameter estimates are not significant at the 95% confidence level.However, this does not invalidate the use of the models, for the following reasons.Firstly, models were selected on the basis of TIME-SERIES MODELS IN RECRUITMENT PROJECTIONS 207 TABLE 3. -Diagnostics from ARMA time-series model fits.AICe = Akaike's Information Criterion adjusted for small-sample size (Hurvich and Tsai, 1989); φ i = ith autoregression parameter; θ i = ith moving-average parameter; σ 2 ARMA = estimated variance of ARMA model fit, t = ratio of parameter estimate to standard deviation of parameter estimate.* = t-test indicates significant parameter estimate at the 95% confidence level.(1963-1996), haddock (1963-1996 and 1968-1996), and whiting (1960-1996).1000 simulation runs were performed for 10 years each using two different projection methods (WGMTERM and ARMA).Plotted percentiles are 5% (dotted), 25% (dashed), 50% (solid), 75% (dashed), and 95% (dotted).SSB values estimated by the relevant ICES Working Group (ICES, 2000) are shown as points.

Species
AICe, not parameter significance.Secondly, the reason that AICe was used is that asymptotic test statistics (such as t) are invalid for small sample sizes (short time-series, in this case), so the fact that t-tests suggest statistical insignificance is neither here nor there.And thirdly, in these examples the parameters which are not significant are also very small, so that innovations dominate in projections which essentially become random walks.The results of this are similar to the bootstrap approach, so the use of the ARMA model when an estimate parameter is not significant is not detrimental as long as that estimate is small.The main conclusion from the projection analyses is that the ARMA model gives a similar median SSB than the WGMTERM model, and with slightly reduced uncertainty.A caveat is provided by Table 4, which shows the F-multipliers on status quo fishing mortality which might be expected to give a 10% probability of SSB in 2006 (cod and whiting) or 2007 (haddock) being less than the agreed B pa .These show that the two competing methods do lead to differences in management parameters, although not in all cases and not in a predictable way.
However, the reasoning behind the use of the ARMA model is not that it necessarily makes a great difference to perceptions of stock dynamics, but rather that it is more statistically appropriate.To this end, a diagnostic tool is presented in Figure 5.Here the median recruitment projection for each model has been plotted alongside a deterministic projection: that is, one in which recruitment is based purely on the fitted stock-recruitment curve, with no stochasticity.Deviation between the median and deterministic projections for a given model would suggest that there is structure in the historical data that has not been adequately encapsulated by that model.It is clear, on this basis, that the ARMA model has less systematic bias and is thus preferable for all four datasets.1963-1996), haddock (1963-1996 and 1968-1996), and whiting (1960-1996), under two different projection models (WGMTERM and ARMA).The median recruitment is the 50 th percentile of 1000 stochastic runs, while the deterministic recruitment is obtained by removing any variability in the projection of recruitment.

DISCUSSION
While we await the results of several on-going large-scale biological process studies, the need for simple empirical methods of population projection is clear.If these are to be useful we would be welladvised to make them as statistically appropriate as possible, giving due regard to the use to which the methods are to be put.The ARMA models described in this paper are an attempt to improve the statistical appropriateness of the current empirical approach, by explicitly modelling the time-series structure in the recruitment residuals rather than allowing it to randomly influence results.The outputs of SSB distributions from the ARMA model are at a similar level to those from the bootstrapped-residual model, but with slightly less uncertainty.We propose that ARMA time-series models (or variants thereof) might be contemplated for medium-term projections by assessment working groups in those cases where sufficiently strong evidence for historical time-series structure is available and data series are long enough to warrant the identification and fitting of a timeseries model.The justification for this, given that the outputs are similar, is based both on a priori reasoning (if there is time-series structure in recruitment residuals then it should be modelled explicitly), and a posteriori diagnostics (median ARMA projected recruitment is a better approximation to deterministic recruitment, implying less intrinsic bias).
There are a number of additional analyses to be carried out in the future to extend the relevance of these findings for fisheries management.The statistical properties of the parametric estimation procedure are worthy of further investigation since the estimating equations are reminiscent of those first proposed by Aitken (1935) and might be improved.The starting date for projections could be pushed progressively further back in time, allowing more extensive evaluation of the degree to which projections from different models agree with estimated SSB values.Bootstrapped sensitivity tests could be used to build up a distribution of median projected SSB levels, thus yielding a measure of the robustness of this metric of projection performance.Where ARMA modelling is inappropriate, investigations could be undertaken into alternative schemes, whereby blocks of residuals are resampled, which would retain some of the time-series structure in projections.Further methodological work should be devoted to the selection of models with due regard to their projection performance characteristics.Finally, time-series characterisation of maturity-and weight-at-age ogives in projections should be investigated, as these may have an effect on outcomes equalling or exceeding that of recruitment.

TABLE 4 .
-Approximate multipliers on status quo F giving a 10% probability ofSSB in 2006 (cod and whiting) or 2007 (haddock)being less than the stock-specific B pa .
FIG. 5. -Diagnostic plot of deterministic (dotted line) and median (solid line) projections for North Sea cod (