Theme E Workshops
WE1: Outcome of Oecd Conference on Validation and Regulatory Acceptance of New and Updated Test Methods in Hazard Assessment
Moderators: Herman Köeter (France) and David Blakey (Canada)
WE2: Training for the Implementation of Alternative Test Methods
Moderators: Marlies Halder (Italy) and Denise Sailstad (USA)
WE3: Validation of (Q)Sar and Other Computational Prediction Models
Moderators: Andrew Worth (Italy) and Mark Cronin (UK)
WE4: Report from the Iclas-Ccac International Symposium on Regulatory Testing and Animal Welfare
Moderators: Clément Gauthier (Canada), S. Pakes (USA), Gilly Griffin (Canada), and William Stokes (USA)
WE4: Report from the Iclas-Ccac International Symposium on Regulatory Testing and Animal Welfare
C. Gauthier1, S. Pakes2, G. Griffin1, and W. Stokes3. 1Canadian Council on Animal Care, 315-350 Albert Street, Ottawa, Ontario K1R1B1, Canada. cgauthier@ccac.ca; 2University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; 3NICEATM, Environmental Toxicology Program, National Institute of Environmental Health Sciences, MD EC-17, P.O. Box 12233, Research Triangle Park, NC, 27709, USA. stokes@niehs.nih.gov.
The first International Symposium on Regulatory Testing and Animal Welfare (ISRTAW), organized by the International Council for Laboratory Animal Science (ICLAS) and the Canadian Council on Animal Care (CCAC), was held in Québec City, Canada on June 21-23, 2001. The 160 participants from 22 countries included representatives from national research and regulatory agencies, universities, and industry involved in chemicals, pesticides, and drug safety testing. The intention of ISRTAW was to provide a platform to promote and harmonize more humane methodologies for testing chemicals and biological products, in an effort to improve the welfare of animals used for safety testing. The main objectives of ISRTAW were to: a) develop or identify best practices to minimize or eliminate pain and distress for animals used in safety evaluation and testing procedures; and b) find ways to improve communications among regulated industry, animal welfare enforcement authorities, and regulatory authorities requiring safety evaluation and toxicity testing. The Proceedings from ISRTAW are scheduled to be published in the ILAR Journal June 2002. Here, we will report on recommendations arising from the Symposium, particularly the principles and best practices that should be implemented now, by all user countries, in order to minimize pain and distress for animals used in regulatory testing.
WE5: Evaluating Safety Tests: Some Recent Experiences and Analyses
Moderators: Leon H. Bruner (USA), Rodger Curren (USA), and Elke Genschow (Germany)
WE5: Assessing the Performance of Toxicity Tests in Validation Studies
Leon H. Bruner1, Rodger D. Curren2, John W. Harbell2, and Greg J. Carr3. 1Gillette Medical Evaluation Laboratories, Needham, MA, USA; 2Institute for In Vitro Sciences, Inc. Gaithersburg, MD, USA; 3Miami Valley Laboratories, The Procter & Gamble Company, Cincinnati, OH, USA. leon_bruner@gillette.com.
We have recently completed a series of studies designed to compare several measures of predictive capacity commonly used to assess the performance new toxicity test methods (NTM). Computer simulations were used to generate data sets similar to those that might be obtained from a large validation study. The parameters used in the simulations were adjusted between runs to produce data sets that had progressively poorer fit of the data to a prediction model that defined the relationship between the NTM and a reference test method (RTM). The data sets were then analyzed using three measures of predictive capacity: the 95% prediction interval, the correlation coefficient, and the contingent probability statistics (CPS), sensitivity (Se), specificity (Sp), positive predictive value (PPV), and negative predictive value (NPV). When the association between RTM and NTM is random, high values for any of the CPS's can be arbitrarily obtained, depending on where the cut-offs are set. Additionally, the sum of Se+Sp=1, and the PPV is equivalent to the prevalence. When the fit of the data to an underlying prediction model is improved, the correlation coefficient increases, the width of the 95% PI decreases, and the sum of Se+Sp increases, until the maximum sum, 2, is achieved. The CPS's, however, are surprisingly insensitive to changes in the fit of data to a defined relationship between an RTM and an NTM. Lastly, the simulations show that Se and Sp vary considerably, depending on the distribution of toxicity included in a reference set of test chemicals used to validate a test. The importance of these findings is that they help clarify the interpretation and utility of these performance measures. It will be important to interpret results from validation studies in light of this new information.
WE5: Using an "Outcome-based" Approach to Describe the Performance of Toxicity Tests that Provide Only Dichotomous Data
Rodger D. Curren1, John W. Harbell1, and Leon H. Bruner2. 1Institute for In Vitro Sciences, Inc. Gaithersburg, MD, USA; 2Gillette Medical Evaluation Laboratories, Needham, MA, USA. rcurren@iivs.org.
During the evaluation and formal validation of new toxicity test methods (NTM), it is customary to compare their performance to that of a "gold standard"--a test method for the same toxic endpoint that has gained the confidence of its users and is assumed to give as good an estimate of the true value as is reasonably possible. An example would be the use of rodent carcinogenicity data as the gold standard for the early evaluation of genotoxicity tests. Often the results of the NTM and the gold standard test are only presented in a dichotomous fashion, e.g. as positive or negative. Performance is then judged by calculating the resulting sensitivity (fraction of positives correctly identified by the NTM [Sn]), specificity (fraction of negatives correctly identified by the NTM [Sp]), concordance (fraction of correct predictions by the NTM) and so forth. However, the next step, judging whether the performance statistics for a given NTM are adequate for its acceptance, is often difficult. We have attempted to simplify this analysis by calculating the result when the NTM (with its predetermined Sn and Sp) is applied to populations of chemicals having different frequencies of positives. We then evaluated how well the NTM performed by observing the amount of improvement in the prevalence of those materials classified as negative by the NTM versus the starting population. This comparison provides a more understandable statistic on which to judge test performance. For example, to reduce the frequency of positives in a population by a power of ten generally (depending on the original prevalence) requires the application of an NTM with both Sn and Sp greater than 0.91. Since it is often assumed that for "screening" tests Sn is the only important factor, we applied this outcome-based analysis to NTM with high Sn and low Sp and found that such a test provides little advantage over conducting no test at all.
WE5: Considering the Test Performance for Trichotomous Data Using Linear Discriminant Analysis--A Case Study
E. Genschow, A. Seiler, G. Scholz, and H. Spielmann. Center for Documentation and Evaluation of Alternative Methods to Animal Experiments (ZEBET), 12277 Berlin, Germany. e.genschow@bgvv.de.
The presentation focuses on the performance of an in vitro test, the embryonic stem cell test (EST) in a formal validation study. Applying linear discriminant analysis (LDA) on the limited training set of ten test chemicals that were tested in one laboratory, a prediction model (PM) was established. A three-factor model gave the best discrimination between the trichotomous data (non-, weak-, strong-embryotoxic). The cross-validation of the PM was assessed in a formal validation study to avoid overfitting of the data. The assessment of the intra- and inter-laboratory reproducibility was undertaken in four independent laboratories. A set of twenty coded test chemicals (7 non-, 7 weak-, and 6 strong-embryotoxic) was tested twice in each laboratory. Apart from the classification result given as correctly classified individual experiments, the predictive power of the PM was judged by the likelihood for classification. The classification rates obtained with the training set and with the test set were calculated. The classification results of the training set (accuracy 93%) and of the test set (accuracy 78%) were assessed according to the performance criteria defined.
WE6: Endocrine Disrupter Issues and Testing: A Discussion
Moderators: Errol Zeiger (USA) and John McLachlan (USA)
WE7: Quality Control Issues in Test Kit Production
Moderators: Amy Rispin (USA), Foster Jordan (USA), and John Harbell (USA)
WE7: Workshop on Quality Assurance for Proprietary Test Methods
A.S. Rispin, K. Gupta, and K. Stitzel. U.S. Environmental Protection Agency, Office
of Pesticides, Washington, DC 20460, USA. Rispin.amy@epa.gov.
This workshop will discuss Quality Assurance issues for the development and manufacture of toxicology test methods that utilize proprietary materials either in the form of proprietary test kits or as regents for generic methods developed to fulfill regulatory testing needs. Regulatory authorities as well as the regulated industries that utilize the results of these test methods need assurance that the materials consistently produce accurate results, preferably without mandating excessive use of controls. The workshop will include discussion of current mechanisms (GLP, GMP, QA, etc.) and whether there is a need for increased industry or regulatory oversight to these materials. Panel participants will include proprietary test method developers, testing laboratories, and regulatory toxicologists.
PCP-E1: Are Animal Tests Inherently Valid?
Moderator: Leon Bruner (USA)
Speakers: Michael Balls (UK) and TBA (USA)