Direct Survival Analysis : a new stock assessment method

In this work, a new stock assessment method, Direct Survival Analysis, is proposed and described. The parameter estimation of the Weibull survival model proposed by Ferrandis (2007) is obtained using trawl survey data. This estimation is used to establish a baseline survival function, which is in turn used to estimate the specific survival functions in the different cohorts considered through an adaptation of the separable model of the fishing mortality rates introduced by Pope and Shepherd (1982). It is thus possible to test hypotheses on the evolution of survival during the period studied and to identify trends in recruitment. A link is established between the preceding analysis of trawl survey data and the commercial catch-at-age data that are generally obtained to evaluate the population using analytical models. The estimated baseline survival, with the proposed versions of the stock and catch equations and the adaptation of the Separable Model, may be applied to commercial catch-at-age data. This makes it possible to estimate the survival corresponding to the landing data, the initial size of the cohort and finally, an effective age of first capture, in order to complete the parameter model estimation and consequently the estimation of the whole survival and mortality, along with the reference parameters that are useful for management purposes. Alternatively, this estimation of an effective age of first capture may be obtained by adapting the demographic structure of trawl survey data to that of the commercial fleet through suitable selectivity models of the commercial gears. The complete model provides the evaluation of the stock at any age. The coherence (and hence the mutual “calibration”) between the two kinds of information may be analysed and compared with results obtained by other methods, such as virtual population analysis (VPA), in order to improve the diagnosis of the state of exploitation of the population. The model may be drawn up in a deterministic format, but the main concepts may be interpreted as expectations if stock and catch are considered as stochastic processes.


INTRODUCTION
Results produced by trawl surveys have often been criticised on the grounds of their lack of robustness based on their high variability.However, the experimental data relative to size and ages of some important target species in the Spanish Mediterranean have shown an excellent statistical behaviour in comparison with the proposed probabilistic models.This reinforces the capabilities of trawl surveys as fundamental sources of information for the management of marine resources.
The present work describes new approaches and models for interpreting the global evolution of the mortality and survival of an exploited marine resource.The modelling of survival allows a bridge to be built between the two classical and often conflicting methods: direct and indirect evaluation.
In this first extension of the classical models, the scenario is restricted to the assumption of a constant exploitation rate from the age of first capture that is coherent with mixed Weibull models for lifetime distribution.
In the Weibull model proposed in this work, four parameters are used to establish the mortality throughout the entire life of the exploited resource: the mortality coefficients, M and Z, of the natural and total mortality rates respectively, the exponent α (the shape of the lifetime distribution) and the effective age of first capture, t c .In this work, the authors do not impose artificial conditions on either the age (whose range is [0, ∞)) or the stock (which is finite and bounded, even for age 0), nor do they use an assumed terminal fishing mortality rate or the mortality model to produce a mortality vector.On the contrary, we demonstrate that one can work directly with the Continuous Survival Model.The name Direct Survival Analysis is therefore not artificial at all.
The authors emphasise the role of the Weibull distribution in fishery science without expecting it to be generally applicable.The only condition for its use should be the goodness of fit with respect to the empirical results.The Weibull distribution here applied, is a natural extension of the generally accepted exponential model underlying the classical Virtual Population Analysis (Gulland, 1965) and its further modifications (Separable VPA by Pope and Shepherd, 1982, and XSA by Shepherd, 1991, 1999).It produces a natural adaptation of the stock and catch equations that makes it possible to "rewrite" the theory and computational tools of pop-ulation dynamics in a relatively easy way, therefore, understandable to all the fisheries scientist who are familiar with widespread software such as FISAT (Gayanilo and Pauly, 1997), VIT (Lleonart and Salat 1997), VPA Suite (Darby and Flatman, 1994).It also provides an admissible model of mortality rate that makes it possible to consider the stochastic character of the survival, and hence, to incorporate the likelihood methodology (which is strong when one is dealing with large samples as is generally the case in fishery applications) and the very powerful tools of survival analysis developed in many biological and epidemiological studies.
Thus, the present work not only provides a new tool for evaluating exploited marine populations, but further provides new approaches and perspectives to marine population dynamics.
Some initial steps of this promising method will be presented in future applications recommended by the recently created Permanent Working Group on Stock Assessment Methodology (PWGAM) of the General Fisheries Commission of the Mediterranean (GFCM).

Trawl survey data and the proportional hazard survival model
The so-called "direct methods", and in particular trawl surveys, are evaluation procedures independent of the fishery activity and hence not affected by the questionable trustworthiness and lack of coherence of catch data provided by fleets and/or fishery administrations belonging to different countries (or even regions in the same country).On the contrary, they allow a temporal coordination and standardization of gears and sampling effort and methods between different research vessels.They can thus be used to establish historical series of coherent data that are suitable for identifying spatio-temporal trends and are very useful for both theoretical research and management purposes.
The assumed survival model has been proposed by Ferrandis (2006) as a battery of Weibull distributions corresponding to the unexploited phase subjected only to natural mortality (with a coefficient of mortality M) and the exploited phase subjected to total mortality (with a higher mortality coefficient Z=M+F) from an age of first capture, t c : (1.1.1) The exponent α is the main parameter related to the shape of the mortality, so it is natural to assume that this exponent is quite stable for a given species and a given region.Therefore, it could be considered as a population parameter.

The growth equation
In general, surveys of marine resources provide size-frequency distributions that may be converted into age distributions through the inverse growth equation if adequate estimations of its parameters are available.
(1.2.1)where l is the size at age t and t 0 , k, L ∞ are the Von Bertalanfy growth equation parameters.Otherwise, the conversion from size to age may be performed if a table of transition probabilities between size and age classes is available.

The identification of a representative age interval (baseline interval)
Most of the demersal resources investigated in trawl surveys are not fully recruited to the gear used and/or to the trawled area.Because of this, for most of the resources the youngest individuals are not present in the capture, and older individuals are often underrepresented in the sample.Therefore, it is necessary to identify an age interval [t 1 ,t 2 ] for which it can be assumed that the obtained sample is representative of the population and which should be contained in the exploited phase, that is to say, in the second interval of ages considered by the Weibull model (1.1.1).In the identification of this baseline interval, the analysis of the age and size distribution with the estimation of their first mode and the knowledge of the size range caught by the fishery fleet may be helpful.
Assuming that the equivalence of the demographic structure in the sampled baseline interval is representative of the population is equivalent to assuming that the mortality rate caused by the survey is proportional to the general mortality in that interval (Cox, 1959) and also that, in this interval, the catchability of the sample gear used in the survey should be independent of age.

Estimation of the conditional survival in the selected interval: the baseline survival
In the probabilistic framework presented by Ferrandis (2006), the estimation of the survival restricted to the baseline established interval may be considered as a conditional probability (denoted by " |") (1.4.1)This conditional survival, S(t|t 1 ≤ T ≤t 2 ), is estimated by the empirical survival function, that is, the quotient between the number of observed ages greater than t and less than t 2 and the total number of observed ages in the baseline interval.This can be expressed formally by: where n is the number of observed ages in the interval [t 1 ,t 2 ] and I [t,t2] is the indicator function of the interval [t,t 2 ], defined, for any set A, by In general, this estimation will be obtained from the historical series of trawl surveys data restricted to the baseline interval and on a cohort basis: i.e. for the estimation of this survival function a cohort, or a set of cohorts, will be considered.Also, these estimations will be obtained on a catch per unit effort basis because the survey effort is not constant during the study period.This is easily done by weighting the frequency of the ages by the particular and measured trawled area (multiplying the frequency by the mean swept area and dividing by the area swept on each trawl).
However, if in this interval the sample is representative of the population, the conditional survival is related to the total survival in the following way: (1.4.3)Dividing by the survival at the initial age of the interval, S(t 1 ) (1.4.4) and, according to the assumed model, the quotients only depend on the two parameters Z and α.

S t t T t T t t T t S t S t S
(1.4.5) Then, the estimation of the parameters of the model, Z and α, may be obtained from the estimation of the conditional survival.This can be achieved in two ways: by non-linear regression, taking the unit values as the seed, and alternatively and complementarily by maximum likelihood.
In fact, the density function corresponding to the conditional survival is: The log-likelihood is thus: (1.4.7) whose derivatives (1.4.8) provide a system of two non-linear equations with the two unknowns Z and α.
However, as is shown in the annex, an explicit equation relates both parameters: (1.4.9)where (1.4.10) the sum being extended on all the observed ages in the baseline interval.
Hence, the system of equations (1.4.8) is reduced to one non-linear equation with the single unknown α, which can be solved computationally by the "regula falsi" method.(Hildebrand, 1974;Curtis, 1994).The authors have developed suitable software to obtain both the non-linear regression (least squares estimation) and the maximum likelihood estimation with a prefixed precision.
These parameter estimation methods provide an estimate of the survival in the exploited phase, which, in accordance with usual concepts in survival analysis, will be called the estimated "baseline survival" (Smith, 2002).
Likewise, the estimation of the baseline survival makes it possible to specify the natural mortality, as described in Ferrandis (2006).For an assumed constant natural mortality rate M 0 and an estimated exponent α , the corresponding coefficient of natural mortality is (1.4.11)

Estimation of the survival in the different cohorts
In order to specify the survival for each of the cohorts considered from the general baseline survival, an adaptation of the Separable Model (Pope and Shepherd, 1982) is proposed.The separability hypothesis expresses the fishing mortality rates as a product of age selection effects correlated with the exploitation pattern and temporal effects correlated with the intensity of exploitation.In this adaptation, the temporal effects will be set as a cohort effect which will become a year effect, when a pseudocohort is considered. Let (1.5.1) be the mortality coefficient corresponding to the baseline survival, which provides the mortality rate (1.5.2) and that corresponding to the i-th cohort (1.5.4) In this context, M(t) = Mαt α-1 is the natural mortality rate, F(t) = Fαt α-1 the age selection effect of the fishing mortality rate and k i the cohort effect.
With the following change of parameter (1.5.5) the conditional survival for a specific cohort and the baseline survival are related as: (1.5.6) and the mortality rates as (1.5.7) Hence, the separable model may be characterised by the relation between the specific mortality rates and the baseline mortality that corresponds to the proportional hazard model (Smith, 2002) considered in the Survival Analysis framework.The specific survival of a particular cohort is a power of the baseline survival.
As in the case of the baseline survival, the estimation of the proportional parameter k i * may be obtained from the estimated conditional baseline survivals in two ways: by non-linear regression, taking the unit value as the seed, and alternatively and complementarily by maximum likelihood.The likelihood solution is (1.5.8)where the sum is extended to the n ages of the cohort considered in the selected baseline interval.
Again, this non linear equation with the single unknown, k i * , can be solved computationally by the "regula falsi" method.(Hildebrand, 1974, Curtis, 1994).As with other required computing facilities, the authors have developed adequate and specific software to obtain the solution of (1.5.8).

Hypotheses on the evolution of survival
An increasing/decreasing trend on {k i * } corresponds to a decreasing/increasing trend on survival {S i }.The hypothesis of a stable survival during the period may be tested by likelihood ratio test: (1.6.1)L(S) being the likelihood corresponding to the baseline survival and the whole data set of ages in the baseline interval, and L(S i ) the likelihood corresponding to the i-th cohort.

Identification of recruitment trends
Splitting the baseline interval into a partition of subintervals {[t i ,t i+1 ),i = 1...k} the catch equation will provide the expected number of individuals obtained in each subinterval of the cohort: where the expected number of individuals at the initial age of the interval, E[N(t i )], is related to the recruitment as: where t r is the age at recruitment.
The particular coefficient of mortality of the cohort and the survival of the cohort, by the proportional hazard model, are and the fishing mortality rate produced by the survey is where q h is the catchability coefficient and A h is the area swept on trawl h.The catchability coefficient will be proportional to the corresponding accessibility.
If it can be assumed that the accessibility is stable in a geographical or bathymetric range, there ) will be a constant catchability, q h = q, in this range and in the considered baseline age interval.Then, in general, the catch equation becomes (1.7.1)Now, taking the set of observed catches as the dependent variable and as the independent variable, the catch equation provides a linear regression .This is a regression without a constant term whose regression coefficient, , will be proportional to the recruitment of the cohort, that is to say, it will constitute a recruitment index.Therefore, the trends in these regression coefficients reflect trends in the recruitment.
In order to evaluate expression (1.7.2), the independent variable of the regression, an estimation of the age of first capture, t c , is needed.This may be obtained from the established management routines of the fishery (such as the age corresponding to the minimum permitted catch size) or from the estimation of an effective age of first capture by adapting the approach introduced by J. Shepherd (1983).This estimation will be developed in detail in the next paragraphs, which links the model with commercial landing data.
The significance of possible recruitment trends can be tested by modelling the recruitment trends.For instance, a linear trend during the period may be formalised as R = a + bY c , R being the recruitment and c the year of the cohort.Substituting this trend in the expected catches (1.7.1), we have a multiple linear regression of the catches as the dependent variable over the two independent variables, X, and the product, XY c as the second predictor.The expect-ed catches (1.7.1) are then rewritten as Therefore, the significance of the coefficient of the predictor XY c implies the significance of the parameter b of the considered linear trend in recruitment.

Age structured commercial catch data
Commercial fleet catches provide biomass and density data for different species.When this information is complemented with biological samplings, size classes can be transformed into age classes which form the basis for the application of agestructured stock assessment models.A data series of successive years can be the source for a cohort analysis.If the number of years is low, an alternative is to analyse pseudo-cohorts (age structure of oneyear catches) as if they represented the evolution of a cohort assuming equilibrium conditions.
In general, the catch-at-age data will be considered as a set of age intervals {[t i-1 ,t i ); i = 1,...,k} and a corresponding set of catch in number {C i ; i = 1,...,k} for each age interval.In the most complete case, the initial and final ages will be: t 0 = 0 and t k = ∞.

Parameter estimation
We use the baseline survival already estimated from trawl surveys, with parameters Z 0 and α, in order to estimate the survival functions that correspond to the cohorts (or pseudo-cohorts) given by commercial catch-at-age data, considering the same species in the same or neighbouring regions in overlapping periods.Their parameters in the exploited phase should be k*Z 0 and α, where k* is a proportionality constant.
In order to estimate the proportionality constant, k*, we identify the first age, t p , in the set {t 0 ,t 1 ,...,t k } contained in the baseline interval in which the estimation of the survival and its parameters has been made.Let N p be the population size at this age cor- responding to the cohort (or pseudo-cohort) that provides the catch-at-age data.That is: N(t p ) = N p .

Estimation of N p and k*
By applying the stock and catch equations proposed by Ferrandis (2006) to the successive age, t i , such that t p ≤ t i ≤ t k-1 , we have the following conditional expectations: Combining (2.3.1),(2.3.2) and (2.3.3)we obtain (2.3.4)where the unknowns are N p and k*.
In general, catch-at-age data, {C(t i ,t i+1 ); t p ≤ t i ≤ t k-1 }, allow the estimation of the parameters N p and k* from this set of catches and the expressions (2.3.4) by non linear regression, i.e. by least squares criteria.The estimates may be the initial values or complementary for the application of maximum likelihood as a method of estimation using the available stochastic models for the stock and the catch processes (Ferrandis, 2006).And the estimation of k* yields the estimation of the total mortality coefficient Z = k*Z 0.

Estimation of an effective age of first capture
One way to estimate the effective age of first capture may be, following the work of Shepherd (1983), to consider the total catch of the cohort (pseudo-cohort), C(0,∞), from an effective age of first capture, t c .The catch equation ( 2 and combining (2.4.6) and (2.4.8), we have the expression .
(2.4.9) as a ratio between the catches in the two intervals.This yields a new estimation of the age of first capture: . (2.4.10) The above expressions may be computed from the information of the total or representative catches of , , the fishery fleet or, alternatively, using selectivity studies of the commercial gears.The selectivity curve assigns a probability of catch depending on the size or the age of the individuals.Thus, weighting the individuals caught in the trawl survey by this probability we "translate" the demographic structure obtained in the survey to the demographic structure of the commercial landings.We use these weighted frequencies to apply expressions (2.4.5), (2.4.7) and (2.4.10).

Estimation of the initial population size
Once the age of first capture has been estimated, the expected catch until the first age considered in the analysis, t p , is given by (2.4.6).This will fit the observed catch exactly if equation (2.4.7) is used to estimate the effective age of first capture.
By the stock equation (2.3.1), the initial size of the considered cohort (pseudo-cohort) will be obtained from the previous (2.5.1) and therefore (2.5.2)

Goodness of fit
The estimation of the parameters from a non-linear regression on the model (2.3.2) makes it possible to calculate confidence intervals for these estimates and the correlation between them, as well as the coefficient of determination of the model, which will give a measure of its goodness of fit.The graphs of the observed versus the expected catches (given by (2.3.2)) will provide a visual description of the fit.

Stock evaluation
The survival model established (1.1.1)and its relation to the size of the population provide the estimation of the expected population in number of the considered cohort at any age.This evaluation may be compared with those obtained by other methods, such as virtual population analysis (VPA).

CONCLUSIONS
The identification of the baseline age interval (an interval in which the catches obtained in the trawl survey may be assumed to be representative of the studied cohorts of the population) makes it possible to estimate the parameters of the proposed Weibull model by maximum likelihood, and therefore to estimate the baseline survival during the study period.
The separable model of fishing mortality rates (Pope and Shepherd, 1984) has been adapted.Thus, the survival/mortality of each specific cohort may be expressed as a power/multiple of the baseline survival/mortality.Using this approach, the evolution of the survival and mortality of the successive cohorts during the study period becomes a particular case of the proportional hazard models considered in the general survival analysis framework.The exponent/factor of the power/multiple relationship may be estimated by non-linear regression or maximum likelihood.
The model makes it possible to test hypotheses about survival and mortality, and in particular the stability of survival, during the study period by the likelihood ratio method.
The extension of the catch equation established provides the expression of the catches in successive age subintervals as a linear regression on a covariate which depends on the survival.The regression coefficients are proportional to the recruitment of the cohort considered.The possible trends in these regression coefficients therefore reflect trends in the recruitment, whose statistical significance may be quantified.
The basic estimation of the baseline survival obtained from trawl survey data may be completed by analysing commercial catch-at-age data.This can help to check the coherence of the two kinds of information as well as the relationship between their respective survival and reference parameters.
The actual and potential uses of the proposed methodology include the following: -Estimation of a general survival ("baseline survival") from a set of trawl surveys.
-Use of proportional hazard models related to the separable model of fishing mortality rate to estimate the specific survival for the various cohorts in the study period.
-Identification of trends in the age structure through the evolution of life expectancy.
-Use of the survival models to identify and test possible recruitment trends.-Using the survival models to evaluate the stock without applying an arbitrary terminal fishing mortality rate.
-Establishing diagnoses relative to the management of the resource throughout the age-structure parameters (life expectancy) and the estimation of reference parameters derived from the mortality rates, the biomass per recruit and the yield per recruit.
The flow-chart of the operational tasks structuring the application of the proposed methodology is shown in Figure 1.These tasks are executed through a set of computer programs that constitute the implementation of the numerical methods described in the present work.The method is being applied to a set of important target demersal species, mainly in the Mediterranean Sea.
., 71(1),March 2007, 175-185.ISSN: 0214-8358   -Verifying the coherence between the estimated baseline survival and other trawl surveys or agestructured data provided by indirect evaluations (VPA).This allows the comparison of the age structure corresponding to different regions and periods. 1 Alternatively, we can focus our attention on the early age catches, C(t c ,t p ), obtained in the interval [t c ,t p ]. Again, applying the stock equations (2.3.1) and (2.4.2) and the catch equation (2.3.2),we obtain