Automated identification and characterisation of microbial populations using flow cytometry: the AIMS project*

Understanding the population dynamics of planktonic microbial communities is central to many contemporary concerns in marine science. Such concerns include fuelling the food webs that lead to fisheries, assessing the responses of coastal waters to pollution and determining the role of the ocean in sequestering anthropogenically produced carbon dioxide. Marine microbial communities are taxonomically and functionally diverse, comprising phytoand zooplankton, bacteria and protozoa. Moreover, marine plankton exhibits a wide range of scales of variability, both in space and time. This variability can only be explained through an understanding of the interactions between different biological and physical processes. A prerequisite for obtaining this knowledge is the ability to sample and analyse the physical, chemical and biological properties of marine waters on appropriate time and space scales. Some biological variables, such as chlorophyll fluorescence, can be measured continuously. However, assessment of changes in the population strucSCI. MAR., 64 (2): 225-234 SCIENTIA MARINA 2000


INTRODUCTION
Understanding the population dynamics of planktonic microbial communities is central to many contemporary concerns in marine science. Such concerns include fuelling the food webs that lead to fisheries, assessing the responses of coastal waters to pollution and determining the role of the ocean in sequestering anthropogenically produced carbon dioxide. Marine microbial communities are taxonomically and functionally diverse, comprising phyto-and zooplankton, bacteria and protozoa. Moreover, marine plankton exhibits a wide range of scales of variability, both in space and time. This variability can only be explained through an understanding of the interactions between different biological and physical processes. A prerequisite for obtaining this knowledge is the ability to sample and analyse the physical, chemical and biological properties of marine waters on appropriate time and space scales.
Some biological variables, such as chlorophyll fluorescence, can be measured continuously. However, assessment of changes in the population struc-  RECKERMANN and F. COLIJN (eds.) Automated identification and characterisation of microbial populations using flow cytometry: the AIMS project* ture of marine microbial populations is either very basic or non-descriptive, such as quantifying total chlorophyll in seawater, or detailed but very labour intensive, such as microscopy. Microscopy allows taxonomic analysis of phytoplankton with relatively rigid cell walls (e.g., armoured dinoflagellates, diatoms and coccolithophorids) whose overall cell morphology is maintained when preserved. However, naked groups, such as small (generally less than 10 µm) flagellated phytoplankton, are often rendered unrecognisable when preserved. This, coupled with microscopical resolution limits, makes it difficult, if not impossible to identify and quantify this preserved phytoplankton with any precision even by trained personnel. This is potentially a major problem, particularly as HPLC analysis of phytoplankton pigments in seawater has shown that in many areas of the ocean small flagellated phytoplankton are the dominant groups of primary producers (Barlow et al., 1999;Bidigare and Ondrusek, 1996;Barlow et al., 1993). Analytical flow cytometry (AFC) is a methodological approach that permits the precise, rapid, repetitive description of the population structure of primary producers, bacterioplankton and zooplankton. AFC obviates many of the problems associated with microscopy and other analytical methods, offering considerable potential for the rapid, accurate and precise analysis of phytoplankton and bacterial populations (Jonker et al., 1995). However, exploiting flow cytometry for examining marine microbial processes is still limited by data analysis techniques. Most of the currently available AFC software provides single parameter histogram distributions and dual parameter scatter plots. However, the optical data often contain 5-11 measurements or parameters that could be used for discrimination. The vast quantities of multivariate data generated by flow cytometers provide a considerable challenge for data analysis. Utilising the multi-parametric data requires complex multi-variate data analysis techniques. Whereas multivariate statistical approaches have been used (Demers et al. 1992;Carr et al. 1996), and can be very successful if the appropriate technique can be found, this is often not simple, and invalid assumptions about distributions can cause major problems. Artificial neural networks (ANNs) on the other hand, do not require a-priori knowledge of underlying distributions, once trained they can make identifications in near real-time and have been shown to have considerable potential for identifying phytoplankton from AFC data (e.g., Frankel et al., 1989Frankel et al., , 1996Boddy et al., 1994;Wilkins et al., 1994Wilkins et al., , 1996 and from morphometric data (e.g., Culverhouse et al., 1994Culverhouse et al., , 1996Williams et al., 1994). Species identification is not always possible based on light scatter and fluorescence characteristics, due either to similarities in the optical characteristics between species or to certain species having a wide range of optical characteristics, such as with clumped cells or chains. One possible solution to these problems is to develop species-specific oligonucleotide probes that hybridise only to their target species regardless of their morphology or life cycle stage. The flow cytometer or the fluorescence microscope can detect a fluorescent label attached to these probes and the probe-conferred fluorescence identifies the target, thus verifying species identification. Fluorescent rRNA probes also have great potential for discriminating other components of the plankton, particularly bacteria (Amann et al., 1995;Giovannoni et al., 1995) and protozoa (Rice et al. 1997a, b), which do not contain autofluorescent pigments. Developing probes for species or wider taxonomic groups is of fundamental importance in providing comprehensive descriptions of microbial communities and for assessing biodiversity at various taxonomic hierarchies.
The measurements of light scatter and fluorescence taken by the flow cytometer characterise the cells and can therefore be used to calculate optical and chemical properties, i.e., cell size or contents. The time of flight of a particle/cell through the measurement beam is related to a length scale of the particle/cell, and the amount of forward and side scattering of light by individual cells is related to shape of the cell in suspension. The red fluorescence arising from the stimulation of chlorophyll a (and associated pigments) as a cell passes through the measurement beam is related to cell chlorophyll content (Burkill and Mantoura, 1990;Hofstraat et al., 1994) and is also related to chlorophyll concentration in field samples as well (Jonker et al., 1995;Li et al., 1993;Veldhuis et al., 1997).
The light scattering and absorption properties of a microbe are directly related to the size, shape and refractive index of the cell (Spinrad and Brown, 1986;Morel, 1991). The refractive index of the cell is determined by its chemical composition (Barer and Joseph, 1954). There are two components of the refractive index of a cell; the real part and the imaginary part. The real part of the refractive index, which is important to cellular light scattering, is related to the intracellular organic carbon concentra-tion. In phytoplankton, the imaginary part of the refractive index, which is important to light absorption, is related to the intracellular pigment concentration. Using these relationships, algorithms to estimate pigment and carbon content of microbial cells from flow cytometric measurements will be developed, based on the procedures of Stramski et al. (1988) and Stramski and Reynolds (1993).
The main objectives of the AIMS project are to (i) provide software to automate identification and characterisation, including artificial neural nets; (ii) provide new data on inherent properties of marine microbes used for algorithm development to compute inherent properties of cells and (iii) develop molecular markers for identification and characterisation of marine plankton. The application of artificial neural networks and the development and application of molecular probes within the framework of the AIMS approach are addressed in this paper.

APPLICATION OF ARTIFICIAL NEURAL NETWORKS (ANNs)
Radial Basis Function (RBF) ANNs are being used as their identification success is at least as good as other ANN paradigms and non-neural classifiers (Wilkins et al. , 1996. Further, RBF ANNs can be trained relatively rapidly, and can discriminate unknown taxa upon which a network has not been trained. Haykin (1994) and Wilkins et al. (1994) provide detailed descriptions of RBF ANNs. Briefly, RBF networks have three layers of nodes or processing elements ( to distribute these data to the hidden layer nodes (HLNs). The HLNs each represent a non-linear Gaussian basis function or kernel, the output value of which depends on the distance between the input data pattern and its centre. Several HLNs represent the data distributions for each species, and it is essential that there are sufficient HLNs to adequately fill the data input space. The outputs from the HLNs are passed to the output layer, which contains one node for each phytoplankton species. The output node with the highest value indicates the predicted identity of the input data.
ANNs learn from examples. To train an ANN, 400-500 patterns were drawn at random from the data files for each species and presented to the network together with the known identity of the pattern of flow cytometric characteristics. How well a network trains depends on the number of HLNs and a variety of other network parameters, which must be optimised by experimentally altering each of these factors in turn. The duration of training can be anything from a few minutes to half an hour or more, dependent on computer hardware, the number of species and complexity of the data. Once trained, the network must be tested using an independent data set drawn from the data files, and the identification performance evaluated by comparing the predicted identity of test patterns with their known identity. Misidentification results from overlap of character distributions, and can only be resolved by obtaining more and/or different characters. Within AIMS, data analysis approaches have been initially developed to recognise target organisms using data of known origin, mono-specific phytoplankton cultures.
Phytoplankton species from the Plymouth Culture Collection (Marine Biological Association, UK) were chosen to provide wide taxonomic and size ranges (1-45 µm), of species found in European waters. Mono-specific cultures were grown at 15°C (± 1°C) and were illuminated under constant light at 100-150 µmol m -2 s -1 . Batch cultures were grown in F/2 medium (Guillard and Ryther, 1962), in 250 ml polycarbonate flasks (Nalgene TM ) for several weeks before analysis and were sub-cultured every three to four days to maintain cultures in exponential growth.
Cultures were analysed by AFC using a Becton Dickinson FACSort TM flow cytometer equipped with a vertically polarised 15 mW argon ion laser emitting blue light at 488 nm. Data acquisition was triggered on chlorophyll fluorescence using laboratory cultures of Micromonas pusilla (1-3 µm) to set the lower analysis threshold. The flow cytometer detector array consisted of two fluorescence photo-multiplier tubes (PMTs), two light scatter PMTs and a photodiode for forward light scatter. Individual cell measurements were made for cellular forward light scatter, chlorophyll fluorescence (>650 nm), phycoerythrin fluorescence (585 nm ±21 nm), side scatter and depolarised light scatter (to enhance the discrimination of coccolithophores, . Samples were either run for three minutes at a flow rate of 107 µl min -1 (± 6 µl), or until 10,000 events had been acquired. Analogue Overall performance (average of diagonal elements): 88.9% correct classification signals from the detectors were digitally converted and stored on computer as listmode data. Data files were then transferred to a PC by local area network or ethernet for analysis. Networks were trained and were able to discriminate and quantify species in mixed assemblages. A typical example of some results is displayed in Table  1 as a misidentification matrix. Table 2 illustrates how the ANN-derived abundance estimates compare to abundance estimates obtained from classical flow cytometric analysis of dual parameter scatter plots for mixtures of 8 phytoplankton species. It can be seen that in this test, the RBF network estimates match the flow cytometric estimates very well.
Verifying the results of the networks is an area currently under development. One way to verify the identity of species predicted by an ANN is to transform the results into co-ordinates that can be used to produce sort decision boundaries on the flow cytometer. Cells can then be sorted from the sample using the flow cytometer's software and electronics and viewed by microscopy, with or without image analysis capabilities to enhance resolution. This approach was tested as part of an experimental workshop of the AIMS project. Simple mixtures containing five species of cultured phytoplankton were analysed on the Becton Dickinson TM FACSort TM flow cytometer and identified by a trained RBF network. The clusters identified by the network as Dunaliella tertiolecta and Tetraselmis suecica were highlighted in scatter plots within the CytoWave flow cytometry acquisition and analysis software package and gates were drawn round the highlighted regions. These co-ordinates were then transferred to the FACSort TM and were used as sort decision boundaries, one for each species. The sorted material was examined using microscopy and was identified as the two target species, thus proving that sorting can be performed in conjunction with artificial neural nets (see Fig. 2).
There are several difficulties in extending the ANN methodology from laboratory mono-specific cultures in the laboratory to heterogeneous populations in the sea. Firstly, the number of classes in any natural seawater sample is unknown (i.e., the problem is unbounded). In addition, there may be taxa in the sample that have not been encountered by the network before. It is, therefore, essential to recognize these new taxa upon which it has not been trained and to label them as unknowns. Encountering unknowns is common to biology and biomedicine but not to other areas of technology, and has not been extensively investigated. RBF ANNs can deal with unknowns by applying constraints to outputs of HLNs or to output layer nodes . To validate and determine the limitations constraint application within the AIMS project, mixed cultures and natural samples will be used. Once unknowns have been found in a sample they then need to be identified using the combination of flow cytometric sorting and microscopy described previously. Having made an identification the new taxa need to be added to the network. To add new species to a network for identification usually requires retraining of the network from scratch, which is time consuming and prone to error for THE AIMS PROJECT 229 those not specialised in the use of ANNs. A possible solution is to provide a library of networks, each of which discriminates a single taxon from all others (Morris and Boddy, 1998). Numerous individual networks will be implemented during the project and the facility to train a net for a new taxon will be made available. Appropriate single species networks can then be combined to provide identifications. Identification of taxa may, however, be less important than identification and quantification of functional groups of organisms, and this may be relatively easily achieved by combining together appropriate groups of species. Finally, obtaining truly representative training data is a major problem. This must cover the complete spectrum of biological variability within a species encountered in a natural environment. Data obtained from laboratory cultures do not accurately reflect characters of the same species in the field, because environmental conditions may dramatically affect cell characteristics measured by AFC. Ideally ANNs should be trained on data obtained from appropriate field samples. This will be achieved by identifying clusters of similar cells from AFC data (using unsupervised ANNs or statistical clustering methods), sorting these (as described above) and identifying the cells microscopically.

APPLICATION OF MOLECULAR PROBES OF TAXONOMIC AFFILIATION
Ribosomal RNA (rRNA) genes are ideal regions of the genome to use for the development of taxonspecific molecular probes. Not only do they contain regions of considerable sequence diversity (variable regions) but also regions that are evolutionarily conserved. These latter regions lend themselves to the design of universal primers for amplification of rRNA sequences using the polymerase chain reaction (PCR), as well as to the construction of family-, genus-and species-specific oligonucleotide probes. Although these attributes are also true for a number of evolutionarily conserved protein-encoding genes, the high copy number of rRNA molecules (up to 10 5 per cell) leads to vastly increased sensitivity and enables single cell analysis to be consistently achieved. Ribosomal RNA genes, especially from the small subunit (18S rRNA in eukaryotes, 16S rRNA in prokaryotes), are also commonly used in the reconstruction of evolutionary relationships among species: thus, a large number of sequences are already available in public databases that can be used to develop taxon-specific probes.
Within the AIMS project, probes for the verification of phytoplankton taxa are being developed according to the following criteria: (a) relevance for European waters; (b) availability in culture collections for probe testing; (c) distribution of species over taxonomic groups and (d) different sizes and shapes of cells. These criteria make it possible not only to develop probes and general methodology required for the purpose of AIMS but also to compare their usefulness over a broad taxonomic base, with an emphasis on phytoplankton species that are important for European waters. The probes range from higher group level (e.g., Chlorophyll a+b = chlorophytes) down to the species level (e.g. Heterocapsa triquetra). Table 3 provides an overview of the algal and bacterial rRNA probes currently available or under development and used in this project (see also Peperzak et al., 2000).
For designing these taxon specific rRNA probes for marine phytoplankton, a database of more than 330 aligned chlorophyll a+c algal 18S rRNA sequences was used. This alignment was analysed with the ARB program (Ludwig, TU München) to find regions of 18-20 nucleotides that were specific for different taxa, but also contained at least one mismatch to all other sequences in the database. The position and composition of the mismatch(es) were also taken into consideration as were GC content (percentage of guanine and cytosine bases) and Tm value of the probe sequence (temperature at which half of the DNA molecules are single stranded) in selecting an optimal probe (Stahl and Amann, 1991).
Potential probes were checked against all published sequences using RDP (Ribosomal Database Project, http://www.cme.msu.edu/RDP/html) (Maidak et al., 1999) and BLAST (Basic Local Alignment Search Tool) search (http://www.ncbi.nlm.nih.gov/BLAST) (Altschul et al., 1990) to find possible matches to sequences not included in our database. Furthermore, the probes were analysed by checking for their ability to form internal loops and self-hybridisation, which may prevent them from binding to their target molecule. Finally, the probe was localised on the two-dimensional structure of the rRNA molecule. It is possible that not all regions of the rRNA are accessible to the probes because of the molecule's secondary structure or because some parts of the molecule are covered by ribosomal proteins. Fuchs et al. (1998) found that in E. coli, probes in some regions of the rRNA sequence gave much weaker hybridisation signals than others and presumably the sites conferring low signal were inaccessible because they were blocked by ribosomal proteins. Even if these results are not directly applicable to eukaryotic (algal) species, it appears that some regions are preferred for designing probes. Any probes passing these different tests were synthesised and used in hybridisation experiments to validate probe specificity.
Routine testing, for probe specificity, involves a 2-tiered approach. First, probes were tested against a broad taxonomic base using a chemiluminescent detection system in which probes were hybridised with filter-bound, PCR-amplified rDNA from different algal species ("dot blots"). For these tests, DNA from target species (algae for which the probe was designed) were blotted and fixed onto positivecharged nylon membrane. DNA from non-target species was similarly used as a negative control. Non-target species were either taxonomically closely related to the target organisms or showed similarities to the target in the probe-binding region of the rRNA molecule. A specific probe should only give signals with DNA from its target species, but not with those from non-target species. Probes that showed only specific binding to their target species in the dot blot hybridisation were used for in situ hybridisation experiments. The in situ hybridisation was designed to address the problems encountered as the probe penetrates the cell and hybridises with the native configuration of the rRNA molecule.
For hybridisation of whole algal cells, the oligonucleotide probes were labelled with the fluorochromes fluorescein isothiocyanate (FITC) or Cy3. The same species were used for in situ hybridisation tests of probe specificity as in the DNA dot blot experiments. For this, ethanol-treated cells are hybridised with the probes because chlorophyll is soluble in ethanol. Thus the autofluorescence of the cells was reduced, preventing it from masking the probe signal. Cells were incubated for 1-2 hours with the probes in a hybridisation buffer at a specific, probe-dependent temperature. Afterwards, the cells were washed three times in the same buffer at a slightly higher temperature to remove binding of the probe to non-specific targets.
For fluorescence microscopy, the samples were resuspended in Citifluor (Citifluor Ltd., Canterbury, UK) to prevent fading of probe signals and counterstained with DAPI for better visualisation of nonlabelled cells. Specificity testing with fluorescence microscopy includes hybridisation to target and nontarget algae at different hybridisation temperatures and different ratios of labelled to non-labelled probes as competitors, to show the specificity of probe binding. Different hybridisation temperatures were used because temperature dependent, specific binding of probes has been demonstrated as signal strength decreases with increasing temperature. The use of fluorescently-labelled rRNA probes in combination with fluorescence microscopy or with flow cytometry can result in specific detection of a single species or a higher taxon Lange et al., 1996;Miller and Scholin, 1998).
Approximately one third of all probes used in the AIMS project have been successfully tested with in situ hybridisation and flow cytometry (e.g., Lange et al., 1996). These are mainly the ones that are specific for classes and higher groups of phytoplankton. The remaining probes are THE AIMS PROJECT 231 species-specific probes, whose specificity has only been tested to date in DNA dot blot analyses. The development and application of specific rRNA probes offers great potential for the analysis of phytoplankton. As with all new methods, a number of questions remain, such as how these probes will perform in field tests. These questions will be addressed within the AIMS project, using experiments in mesocosms and research cruises. Also, new species must be tested with existing probes to confirm that probe specificity is maintained. The use of rRNA probes is a valuable tool for the identification and characterisation of phytoplankton populations either in combination with flow cytometry or with fluorescence microscopy. With this tool it is, for example, possible to monitor harmful algal blooms (Brenner et al., in prep.) and to characterise species that cannot be cultured, e.g., the dinoflagellate Dinophysis. Molecular probes will also be able to characterise taxa that are difficult to analyse by flow cytometry, such as colonial and chain-forming species, as well as separating species with similar optical properties.

INTEGRATING IDENTIFICATION AND CHARACTERISATION OF CELLS: THE AIMS SOFTWARE AND DATABASE
The main end product of the AIMS project will be an integrated software package combining flow cytometry acquisition and visualisation software (CytoWave), artificial neural network software (AIMSNet), and algorithms to convert raw flow cytometric data into inherent properties of cells, such as, cell size, volume, pigment content and refractive index (CellStat).
A database of simultaneous measurements of biomass, pigment content, light absorption, light scattering and cell size has been compiled from measurements on laboratory cultures and mesocosm experiments and will also include data from a seagoing research cruise in 2000. This database, called AIMSBase, will also contain flow cytometric analyses and is accessible via internet (www.flowcytometry.org). AIMSBase allows for the comparison of flow cytometric measurements of forward scattering, right angle scattering, time of flight and fluorescence with chemical and optical measurements in order to develop algorithms for estimating pigment and carbon contents of microbial cells from flow cytometric measurements.
A number of different analytical flow cytometers is being used within this project because differences in optical geometry can lead to wide variations in scattering signals and because the final software will be designed for use with multiple flow cytometer configurations. Measurements have been made using general purpose (Becton Dickinson TM FACScan TM and FACSort TM , Coulter XL TM ) and specialised (Cyto-Buoy TM , Dubelaar et al, 1999) flow cytometers.
The data acquisition and visualisation software includes integrated artificial neural networks for data analysis. This integration should permit near real-time ANN analysis of flow cytometric data. The results of an ANN analysis can be displayed graphically by means of standard representations of the data such as dot plots and histograms, as well as in tabular form, listing taxa present with an estimate of their abundance.
The software allows trained ANNs to be generated for different combinations of species or subpopulations. The use of combinations of networks is also foreseen, allowing the rapid construction of networks to accommodate novel combinations of species by combining ANNs drawn from a library of existing pre-trained networks, each specialised to a particular group of species or set of environmental conditions.
The progress in the application of ANNs, combined with the discrimination power of rRNA probes and the estimation of inherent properties of cells will greatly improve the application of flow cytometry for the automated analysis of phytoplankton in field samples and the use of flow cytometry for monitoring purposes.