Selection for trastuzumab therapy depends on a companion diagnostic assessment, the classification of a patient's breast cancer as HER2/neu-positive by standard assays of immunohistochemistry (IHC) for protein overexpression or fluorescence in situ hybridization (FISH) to detect gene amplification.1 Although the reproducibility and accuracy of these methods have improved over time, controversy continues over which test is best. Both IHC and FISH suffer from errors of reproducibility across testing laboratories, although data from several studies of small sample sizes suggest that FISH yields more consistent results for the same samples across multiple test centers. 2-4 However, FISH costs more than four times the cost of IHC,5 and it requires more expertise to perform and measure results.6 The value of both tests is further complicated by the site at which the tests are performed. A report of local and central laboratory testing of more than 2500 patients from the N9831 clinical trial of adjuvant trastuzumab treatment showed 88.1% concordance for FISH and 81.6% for IHC,7 suggesting each test may be subject to between 12% and 20% error. This observation has recently been further clouded by a report from Paik and colleagues8 that showed response to therapy in patients whose central tests were negative as equivalent to response to therapy for patients whose central tests were positive.
A number of studies have been published comparing IHC to FISH for reproducibility against biochemical assays (radioimmunohistochemisty, Southern blotting, or Western blotting) of HER2 expression, and most have concluded that FISH-based testing is more accurate than standard IHC.9,10 However, only 2 studies have compared these methods using response to trastuzumab as a metric of "true-positive" HER2/neu breast cancer, and in both of these, the impact of the FISH assay was tested in a patient population selected by only IHC.11,12 Many oncologists now rely on the results of both assays before making the clinical decision to start trastuzumab.
A reliable and accurate assay to predict response is critical now that indications for trastuzumab have moved to the adjuvant setting. Three major recent clinical trials of trastuzumab for postsurgical treatment of HER2+ breast cancer have shown its effectiveness.13,14 They showed an average of around 50% reduction in risk of relapse during trastuzumab therapy.1 However, this result is expensive, as costs to administer trastuzumab to 1 breast cancer patient often exceed $100 000 a year, and the risk of treatment includes potentially fatal cardiotoxicity.1 When the low response rate is considered in that context, it is clear that improved pharmacodiagnostic assays are needed to improve the cost-benefit ratio of trastuzumab therapy.
Although there have been a range of efforts to find a better test with a variety of methods, immunofluorescence- based assays have not yet been evaluated. Here, we describe the assessment of an immunofluorescence-based method for in situ assessment of protein concentration called AQUA (automated quantitative analysis; HistoRx, New Haven, Conn). AQUA is an automated immunofluorescence technique for quantifying compartmentalized protein expression patterns9 that produces a continuous scale of fluorescent intensities, as opposed to the traditional 4-point ordinal scale associated with subjective, manual scoring approaches to IHC. This assay was performed on a series of 152 cases treated with trastuzumab in the metastatic setting collected by a group at the British Columbia Cancer Agency (BCCA). In each case, HER2/ neu copy number was measured by FISH, and protein expression was measured by IHC. These patients were previously selected for therapy based on IHC and/or FISH using standard assay protocols, and so all assays were repeated for the current study in tissue microarray (TMA) format. The purpose of this retrospective study was to estimate the diagnostic accuracy of AQUA and compare its diagnostic accuracy to FISH and IHC using Response Evaluation Criteria in Solid Tumors (RECIST)-based patient response15 to trastuzumab as the criterion standard.
MATERIALS AND METHODS
Study Participants
The BCCA cohort is summarized in Table 1 and has been described previously in detail elsewhere.16 Patients who received trastuzumab between 1998 and 2004 were identified from the BCCA pharmacy database. Charts were reviewed, and patients were excluded if they were stage IV with no evidence of disease, if they received fewer than 6 weeks of trastuzumab treatment, or if there was insufficient information on clinical outcome. Further details of the cohort may be found in Robinson et al.16 The study was approved by the Clinical Research Ethics Board of the University of British Columbia, and adequate tissue specimens from 152 patients of the 306 who received trastuzumab were available for array construction. Hematoxylin-eosin-stained sections of each tumor sample were examined, and areas of invasive carcinoma marked on both the slide and the corresponding paraffin block for TMA construction. Two 0.6-mm-diameter cores of tumor specimen were removed from the marked region of the paraffin block and were transferred to defined array coordinates in a recipient TMA block (BCCA 04-010). Patient classification and outcome data were also extracted from the chart and clinical database.
Tumors from all patients were 2+ or 3+ by IHC, or were FISH+ (HER2/CEP17 signal ratio > 2), which were standard cutoffs at the time of treatment. Clinical assays performed for trastuzumab treatment selection (IHC and/or FISH) for the BCCA cohort were performed locally on whole-tissue sections (WS) of tumor blocks. This information was collected on review of patient charts, and results were available for 145 of 152 cases. Of these 145 cases, 41 were enrolled by IHC only, 49 by FISH only, and data from both assays were available in 55 cases and agreed in 42 of the 55. A total of 13 cases had discordant results: 7 were IHC+/FISH-, and 6 were IHC-/FISH+. Current national guidelines for trastuzumab selection suggest a cutoff of 3+ for IHC, or a HER2/CEP17 ratio greater than 2.2 for FISH, and so previously recorded pharmacodiagnostic assay results were reevaluated by these guidelines.1,17 Of the 145 cases with available trastuzumab enrollment data, 132 of 145 had ordinal IHC scores or quantitative FISH ratios recorded and could be reevaluated. All cases (100%) were HER2+ by these updated criteria. A total of 16 cases had discordant results: 8 were IHC+/FISH-, and 8 were IHC-/FISH+.
Objective response categories included complete response, partial response, stable disease, clinical benefit, and progressive disease. 15 Using current clinical standards for patient selection, about 1 in 2 metastatic breast cancer patients will respond to chemotherapy with trastuzumab.18,19 Similarly in this cohort, 47.5% of patients had at least partial response to treatment. All available patients were used for each pairwise comparison in the study.
TMA Construction
Tissue microarray construction was performed with a tissuearraying instrument (Beecher Instruments, Silver Spring, Md) using a method that was described previously.20 All precut sections were coated in paraffin and stored at room temperature in a nitrogen chamber prior to staining to prevent loss of antigenicity. 21
Fluorescence In Situ Hybridization-Tissue Microarray
Interphase FISH analysis was performed on the BCCA 04-010 TMA using the PathVysion HER2 DNA Probe kit (Vysis, Downers Grove, Ill) containing 2 directly labeled fluorescent DNA probes specific for the HER2/neu gene locus (LSI HER2/neu SpectrumOrange) and the chromosome 17 centromeric satellite DNA (CEP 17 SpectrumGreen).22 The microarray slide was baked at 60°C for 2 hours, deparaffinized in xylene, and rinsed in 100% ethanol. The array was then pretreated in sodium thiocyanate (Vysis) at 80°C for 10 minutes, digested with pepsin for 15 minutes (4 mg/mL in 0.2M HCl; Vysis), and rinsed in graded ethanols (70%, 85%, and 100%). Probe/hybridization mixture (20 µl), including blocking DNA, was applied to the array and contained with a coverslip sealed with rubber cement. The probes and tissue specimens were codenatured and hybridized using the Vysis Hybrite hybridization system. The Hybrite unit was programmed to allow 5 minutes of denaturation at 73°C, followed by overnight hybridization at 37°C. The slides were subsequently washed in 2× SSC/3% NP-40 and counterstained with 4',6-diamidino- 2-phenylindole. HER2/neu gene amplification was quantified by comparing the ratio of locus-specific (LSI) HER2/neu to centromere DNA (CEP) 17 probe signals in accordance with PathVysion HER2 DNA Probe kit criteria. Histospots were examined directly using an Olympus AX70 epifluorescence microscope equipped with narrow bandpass filters (Melville, NY). Each histospot was initially scanned at low power to identify appropriate areas of tumor tissue with clearly defined nuclei. Images of evaluable tumor areas were captured at low and high power in several focal planes in the first 24 hours after staining and archived for review to account for false-negative results from signal fading over time. The ×60 objective was then used to score signals in at least 30 nonoverlapping tumor cell nuclei to determine the average number of HER2/neu and chromosome 17 copies per cell for each case, using tumor cells from each sample available. Nuclei with overlapping signals were excluded from scoring. The ratio of the HER2/neu and chromosome 17 copy number averages (ERBB2/CEP17) was used to determine the presence of HER2/neu gene amplification, and specimens with Erb-B2/CEP17 greater than 2.2 were scored as positive for FISHTMA. Patient scores were recorded in 80 of 152 patients. Of 97 patients with available FISH-WS data, 52 were evaluable by FISHTMA.
Cell Lines
A TMA containing cores from formalin-fixed, paraffin-embedded cell pellets was used as a control for staining and AQUA analysis. SKOV3 and Chinese hamster ovary (CHO) cells were obtained from the Maihle laboratory at Yale University. MDAMB435S, BT-549, MDA-MB-231, MDA-MB-468, MDA-MB-436, MCF-7, HT-29, A431, T-47D,MDA-MB-453, ZR-75-1, SK-BR-3, and BT-474 cell lines were purchased from the American Type Culture Collection (Manassas, Va). Negative controls for HER2 expression included MCF7 and MDA-MB-175, which express low to normal levels of ERBB2 mRNA,23 as well as CHO cells, nontransfected and transfected to express full-length epidermal growth factor receptor (CHO-EGFR),24 whereas positive controls included the ERBB2-amplified cell lines MB453, ZR75-1, SKBR3, BT474, SKOV3, and CHO-HER2, CHO cells stably transfected to express full-length HER2 protein. Culture conditions and cell line TMA construction have been published in detail elsewhere.25,26 Our laboratory protocol for processing cell lines for paraffin embedding is also available online (http://www.tissuearray.org).
IHC-TMA for Conventional Analysis
Immunohistochemistry methods for HER2 have been described previously in this cohort.16 Briefly, the rabbit polyclonal c-Erb-B2 antibody (A0485, Dako Corp, Carpinteria, Calif) was used to detect HER2/neu expression at a dilution of 1:500 after heat-induced epitope retrieval in Dako Target Retrieval Solution. This is the same polyclonal antibody supplied by Dako in the HercepTest. Patient samples were excluded for insufficient tumor sampling, and the highest score of 2 samples was recorded as IHC-TMA in 141 of 152 patients. Of 90 patients with available IHC-WS data, 83 were evaluable by IHC-TMA.
IHC for Automated Quantitative Analysis (AQUA-TMA)
Slides were stained by a modified indirect immunofluorescence method as described previously.9 The arrays were deparaffinized with xylene and alcohol and rehydrated in water. Antigen retrieval was performed with a 0.15mM sodium citrate buffer at pH 6.0 in a pressure cooker for 10 minutes. After washing in 0.1M Trisbuffered saline (TBS), endogenous peroxidases were quenched by incubation for 30 minutes in a 2.5% hydrogen peroxide/methanol solution. Nonspecific binding was reduced by preincubation for 30 minutes in a 0.3% bovine serum albumin/0.1M TBS blocking solution.
Primary antibodies included mouse monoclonal cytokeratin AE1/AE3, used to define the tumor compartment of each histospot, and rabbit polyclonal c-Erb-B2 antibody (A0485, Dako) to detect HER2/neu expression. This is the same polyclonal antibody supplied by Dako in the Food and Drug Administration- approved HercepTest. The antibodies were diluted 1:100 and 1:8000, respectively, in blocking solution and applied to tissue arrays for overnight incubation in a humidity tray at 4°C. Slides were washed twice for 25 minutes in 0.05% Tween 20/0.1M TBS, followed by 5 minutes in 0.1M TBS, and incubated for 1 hour with secondary antibodies, which included an Alexa 488-conjugated goat anti-mouse antibody diluted 1:100 in 0.1M TBS, and prediluted goat anti-rabbit antibody conjugated to a horseradish peroxidase-decorated dextran-polymer backbone (EnVision, Dako). Slides were then incubated for 10 minutes with Cy5-tyramide, which is activated by horseradish peroxidase, to visualize HER2 expression. 4',6-Diamidino-2-phenylindole (Molecular Probes, Eugene, Ore) was used to stain the nuclear compartment.
Positive and negative controls were included in a specialized "boutique" array stained simultaneously and containing 40 cases from a previously described breast carcinoma TMA,25 as well as 19 formalin-fixed, paraffin-embedded cancer cell lines exhibiting variable levels of genomic amplification and protein expression of HER2. In addition, a breast cancer test slide was stained with each experiment without primary antibody.
AQUA of TMAs
A complete and detailed discussion of the AQUA method has been published previously.9,27 Wide-spectrum cytokeratin is expressed in varying degrees in solid tumors of epithelial origin and was used to identify tumor cells and normal epithelial cells for automated, compartmentalized analysis. 4',6-Diamidino-2- phenylindole was used to visualize nuclei within each histospot. Monochromatic images of each histospot were acquired on an Olympus AX-51 epifluorescence microscope using a motor-driven stage and automated custom software, and high-resolution (1024 × 1024 pixels; 0.5-µm) digital images were analyzed using AQUA. Briefly, a binary image (epithelial mask) was created from the cytokeratin image of each histospot, representing areas of epithelium. Histospots were excluded if the epithelial mask represented less than 5% of the total histospot area. 4,6-Diamidino- 2-phenylindole immunoreactivity defined the nuclear compartment.
The membrane compartment was defined by perimembranous coalescence of cytokeratin immunoreactivity with specific exclusion of the nuclear compartment. HER2 expression was quantified by calculating Cy5 fluorescent signal intensity on a scale of 0 to 255 within each image pixel. An AQUA score was generated by dividing the sum of HER2 signals within the epithelial mask by the area of the membrane compartment. Percentages of cytokeratin and HER2 signal area within the histospot and within the tumor mask were also calculated. After validation of images to ensure adequate tumor sampling, AQUA scores were averaged from 2 nonoverlapping samples in 115 cases, from 1 in 20 cases, and no AQUA measurement was available for 17 cases.
A recent analysis of AQUA for HER2 measurement showed a strong correlation between AQUA scores, quantitative enzymelinked immunosobent assay protein measurements, and HER2/ neu gene amplification for a standard set of breast cancer cell line controls.26 We repeated both cell line and breast tumor samples used in this study as a reference for HER2 positivity.
Statistical Methods
Using the software JMP, Version 5.0, reproducibility of AQUA scores between tumor samples was determined by linear regression with Spearman rank correlation after natural log transformation to account for nonnormality of the data. Statistic calculations for assay comparison included Pearson ?^sup 2^ analysis to analyze correlations between continuous and ordinal variables (AQUA to IHC) and Spearman rank correlation to compare continuous variables (AQUA to FISH). Statistic calculations for comparison of trastuzumab response by assay included Pearson ?^sup 2^ analysis for ordinal variables (IHC) and Wilcoxon rank sum ?^sup 2^ for continuous results (AQUA and FISH).
Logistic models, true- and false-positive rates, bootstrap sampling, and cross validation were performed in the statistical package R.28 The corresponding code is available from the authors. Univariate logistic models were constructed with the available data from each assay, and all models controlled for concurrent treatment categorized as "none," "paclitaxel," "vinorelbine," or "other." When noted, the cohort was limited to patients treated concurrently with paclitaxel. Patient outcome of interest was categoric response to combined chemotherapy and trastuzumab. Patient response categories were binarized as complete/partial response versus stable/progressive disease. Current guidelines from the American Society of Clinical Oncology and the College of American Pathologists (ASCO/CAP) were used to dichotomize clinical assays: IHC (HER2+, IHC 3+) and FISH (HER2+, CEP17/ERBB2 > 2.2). After normalization to tumor and cell line standards, we determined an AQUA cutoff of 20 in order to compare diagnostic accuracy with pathologist scoring of IHC and FISH. This was chosen based on the distribution of AQUA scores in the cohort, showing a trough around 20, and confirmed by cell line control AQUA score, in which a score of 20 separates cell lines with the highest levels of HER2 expression.
The resulting models' prediction performance was assessed with leave-one-out cross-validation, and the corresponding intervals with bootstrap resampling (see below).29,30 In a recent comparison of different resampling methods, leave-one-out cross-validation was found to be superior in minimizing bias and assessing predictor performance in comparison to other forms of crossvalidation, bootstrap methods, and training-test set split methods.29 Confidence intervals for the prediction error estimates were constructed by nonparametric bootstrap resampling.31 In this study, the statistic of interest is the leave-one-out cross-validation prediction error estimate. Reported 95% confidence intervals are based on B = 1000 bootstrap samples.
RESULTS
The goal of this study was to compare 5 methods of assessment of HER2 expression, including IHC-WS, IHCTMA, FISH-WS, FISH-TMA, and AQUA-TMA. AQUA assessment of whole slides can be done,32 but whole slides from this cohort were not available for this study. We performed FISH for ERBB2 gene amplification, and IHC for HER2 protein expression in TMA format, and we compared the results to clinical HER2 testing on tumor sections (Figure 1). We observed that of the 83 patients who received trastuzumab based on an IHC score of 2+ or 3+, 28 were scored as 0 or 1+ on repeat testing in TMA format. This rate of irreproducibility by IHC (33.7%) is similar to the 30% rate widely cited from large-scale analysis of local versus central testing for IHC (Figure 1, C).7,33 When FISH was repeated in a TMA format, scores were linearly correlated (R = 0.450) but globally lower than those from clinical testing (Figure 1, F). Of the 52 patients enrolled by FISH scores (ratio > 2), 20 (38.4%) had ratios of 2.0 or less in the TMA format.
We repeated HER2 testing using AQUA as described above.9,34 AQUA scores were measured only in the membrane compartment, and Figures 2 and 3 show representative images from immunofluorescent labeling of HER2 protein expression in the breast cancer samples studied. For quality control of the AQUA immunofluorescent assay, we used a control TMA with 40 well-characterized tumor samples from the Yale Pathology archive as well as 19 paraffin- embedded cell lines with a wide range of HER2 protein expression. We compared AQUA scores of the cell line controls in this study to previously published results25,26 of HER2 protein expression by enzyme-linked immunosorbent assay and ERBB2 gene amplification by FISH (Figure 4, A). At the dilution used in this assay, there was a linear relationship between AQUA and ELISA scores, as well as AQUA and FISH among the control cell lines that demonstrated gene amplification (SKBR3, BT474, and MB453; Figure 4, A and B).
The frequency distribution of HER2 AQUA scores in the trastuzumab-treated cohort is reflective of the cohort's enrichment for HER2 expression using clinical pharmacodiagnostic assays. The distribution of AQUA scores is bimodal, and a significant number of tumor samples had very low HER2 expression by the AQUA assay (Figure 4, C). In 115 cases, HER2 AQUA scores were averaged between 2 patient samples, and agreement between replicate samples was high (R^sup 2^ = 0.893, Spearman Rho = 0.873, data was log-transformed to account for nonnormality; Figure 4, D). Of 15 patients with average AQUA scores of less than 5, all individual measurements were less than 10 (Figure 4, C and D). This score is similar to that observed in MCF - 7, a breast cancer cell line with low HER2 protein expression and lacking HER2/neu gene amplification (Figure 4, B).35
Quantitative AQUA scores were compared to IHC and FISH results from assays performed on both whole sections and in TMA format, and AQUA scores are directly associated with both IHC scores and gene amplification as measured by FISH (Figure 5). Although all patients were enrolled using IHC testing guidelines that were current at the time of enrollment, a wide range of HER2 expression was measured by quantitative AQUA measure regardless of the IHC-WS score (Figure 5, A). The distributions are more tightly correlated when IHC was repeated for our study on TMAs (IHC-TMA; Figure 5, B).
Fluorescence in situ hybridization scores are also directly associated with AQUA scores, but a wide range of AQUA scores can be observed for any level of genomic amplification (Figure 5, C and D). For example, an AQUA score of 80 is associated with FISH-WS signal ratios ranging from less than 1.8 to more than 18 (Figure 5, C). Fluorescence in situ hybridization signal ratios on TMAs are overall lower than on whole sections but maintain direct correlation with AQUA scores (Figure 5, D).
Next, in order to investigate the relationship between HER2 testing methods and response to trastuzumab, we compared the 3 assays across 4 objective response categories determined by retrospective chart review. Treatment response was not available for 30 patients. Complete response was observed in 13 patients, partial response in 45 patients, stable disease in 19 patients, and progressive disease in 45 patients. In Figure 6, assay scores are compared to response categories by ?^sup 2^ analysis using the Pearson statistic for ordinal assays (IHC-WS and IHC-TMA) and Wilcoxon rank sum for assays with available continuous data (FISH-WS, FISH-TMA, and AQUA-TMA). In order to limit the effects of heterogeneous concurrent chemotherapy on this analysis, we limited the cohort to 83 patients treated concurrently with paclitaxel, which was the most common chemotherapy given with trastuzumab in this cohort. In this analysis, we found no association between response and FISH-WS (P = .96), FISH-TMA (P = .55), or IHC-WS scores (P = .75). Immunohistochemistry- TMA approached significance (P = .06), and a significant relationship between AQUA-TMA score (P = .01) and categoric response was observed. Fluorescence in situ hybridization-negative tumors (?1.8) were absent from compete response category (Figure 6, A and B). There is a clear separation of AQUA scores between the 4 categories of treatment response (Figure 6, E).
Logistic regression models were built on the data for all clinical assays by current ASCO/CAP guidelines (Table 2) and were adjusted for each patient's concurrent chemotherapy. 17 A cutpoint of 20 was chosen to dichotomize AQUA scores, based on the score distribution in the population and control cell line data (see "Materials and Methods"). We classified patients with partial or complete response (47.5%) as responders, and patients with stable disease or progressive disease (52.5%) as resistant.15 The odds ratio and corresponding 95% confidence intervals for each assay's prediction of response (partial or complete response) are charted in Table 2, with and without adjustment for concurrent treatment. The IHC-WS and both FISH assays failed to reach significance in this cohort. The IHC-TMA and AQUA-TMA were directly associated with improved prediction of response to trastuzumab, and the effect of both protein expression assays was independent of concurrent chemotherapy regimens.
The value of each clinical assay to classify patients as HER2+ and predict targeted therapy response can additionally be assessed by relative biostatistic measures, including sensitivity and specificity, but the absence of HER2-negative patients represents a statistical challenge. The true-positive fraction (TPF) is the proportion of patients who scored HER2+ by a particular assay given that the patient responded to therapy. The false-positive fraction (FPF) measures the proportion of patients who scored HER2+ by a particular assay given that the patient did not respond to therapy. When restricted to only those patients who screened HER2+, the absolute values for the TPF and FPF cannot be estimated,36 since we cannot assess true negatives (they were not offered the drug). However, we can make inferences about the relative accuracies, that is, relative true-positive fraction (rTPF) and relative false-positive fraction (rFPF).37 The rTPF is the ratio of the TPF of one test compared with the TPF of a second test. Thus, tests can be compared, even if the TPF cannot be individually evaluated. Furthermore, that comparison can be tested for significance by assessment of 95% confidence intervals. Table 3 charts pairwise comparisons of rTPF and rFPF for all assays. Due to the impact of missing data on the FISH-TMA slide, we compared it only to FISH-WS.
The FISH-WS was the most sensitive assay, and it had a significantly higher TPF than either FISH-TMA or IHCTMA. Only 2 responders would have been excluded if FISH-WS was used alone for treatment qualification. Of note, the rTPF (95% confidence interval) of FISH-WS versus AQUA-TMA was not significant (1.045; 0.958-1.141). The AQUA-TMA was also very sensitive, with an rTPF equal to 1.6, signifying a greater TPF than IHC-TMA. The IHC-TMA was the most specific assay, and it had a significantly lower FPF than FISH-WS, IHC-WS, and AQUATMA. Only 25.9% of the nonresponders were scored as HER2+ in the IHC-TMA assay. The FISH-WS was the least specific, with the highest FPF relative to any other assay, although not significantly different from IHC-WS or AQUA-TMA.
The application of a predictive model to an independent cohort is the best test of an assay's clinical value. However, at the time of this submission, a similar cohort was not available for analysis. Instead, the performance of each assay in a future cohort was estimated using misclassification rates, based on the logistic models of paclitaxel-treated patients. For each conventional classifier as well as AQUA-TMA, rates were estimated via leave-one-out crossvalidation, and bootstrap resampling was used to generate confidence intervals (Table 4). Prediction error estimates surpassing chance (0.50) were attained by FISH-WS, IHCTMA, and AQUA-TMA (Table 4), and AQUA-TMA achieved the lowest misclassification rate (0.30, 0.211-0.389 95% confidence interval). Misclassification rates of pairwise combinations of the conventional classifiers (eg, IHCWS + and FISH-WS+) were also calculated but did not improve misclassification over univariate analysis.
COMMENT
Many investigators have sought to improve current HER2 testing methods by comparing modalities of testing where the gold standard for assay comparisons was considered gene amplification as assessed by FISH or by Southern blotting.38-40 However, the expert panel from ASCO/CAP published in this journal in Appendix C of that work states, "the optimal outcome of interest should be clinical benefit from anti-HER2 therapy." 17 Although a patient's HER2 status has implications beyond trastuzumab treatment, in this study we used response to trastuzumab (and concurrent chemotherapy) as the criterion standard for a true-positive HER2 assay. When addressed in this manner, the optimal assays for prediction of response to trastuzumab in this cohort are IHC-TMA and AQUA-TMA.
Although the current study shows interesting comparisons between techniques, its limitations must also be considered. The assay results from whole sections were performed in local laboratories by different technicians during a 6-year period. Because we only had access to tumor samples in the TMA format, we could repeat the clinical assays along with the AQUA measurements, but we were unable to perform AQUA measurements on whole slides. In addition, at the time of this study, only patients with metastatic disease were offered trastuzumab, limiting the size of the cohort. The retrospective nature of the data collection and the possible influence of variable concurrent chemotherapies are also significant limitations. In order to limit the influence of response to concurrent chemotherapies, we adjusted models for concurrent therapy when possible, or we limited the cohort to the large subset treated with paclitaxel. Despite these limitations, we assert that the current study is valuable as a comparison of the current clinical testing modalities to a reproducible, quantitative measure of HER2 protein with the use of patient outcome as the gold standard for predictive accuracy.
When we repeated IHC and FISH, we found high rates (33% and 38%, respectively) of cases that were scored as positive on the tumor section but negative on TMA. The initial testing was performed in local laboratories on whole sections of tumor, whereas our testing was performed centrally on two 0.6-mm cores of tumor. There are some data suggesting that central testing is more accurate than local testing,7 although recent preliminary studies assessing response in large trastuzumab trials may overturn this observation.8 Tumor sampling issues associated with TMA may also contribute to the observations of this work, although previous studies validating TMAs for HER2 scoring in breast cancer found that 2 to 3 cores were associated with excellent agreement in HER2 status results by IHC.41-43 Furthermore, the tumors arrayed in the TMA block were previously selected clinically by IHC or FISH on whole sections, enriching the cohort for true-positive cases.
Rates of positivity by FISH were globally lower when tumors were examined simultaneously in TMA format (Figure 1, D through F) versus results from tumor sections. Others have assessed this issue and found 99% concordance between assessable TMAs and whole slides,44 but these studies were done prior to the ASCO/CAP guidelines. We believe the effect we observed is related to testing factors inherent to the scoring guidelines of the Vysis assay, which stipulate that ratios of FISH signals are calculated from at least 20 nonoverlapping nuclei. When quantifying FISH signals in our TMA study, we counted 30 nonoverlapping nuclei to improve reproducibility. Nuclei with ERBB2 or CEP17 signals that coalesced were also excluded from scoring, and this decreased the numbers of nuclei that could be counted from areas of tumor displaying very high amplification (in which many signals overlap). In our TMA study, adherence to these guidelines led to a high rate of missing data, as only 80 of 152 tumors met all the criteria for scoring. The FISH scoring in local laboratories on the whole section may have been less conservative, given that the assays were performed with the prior knowledge that any area of amplification greater than 2 would be scored as positive, thus qualifying the patient for treatment. It is also possible that signals were quantified from areas of the tumor with higher amplification in the local laboratories, because a pathologist or technologist may scan a stained FISH assay at low power to select bright areas for scoring.
Immunohistochemistry, performed in the TMA format by the traditional brown-stain method or by AQUA, outperformed FISH by most measures. This outcome is not measurable in the many studies that have compared IHC to amplification as the gold standard. Furthermore, only AQUA-TMA was strongly associated with categoric response by ?^sup 2^ analysis (Figure 6). When analyzed by logistic regression, both IHC-TMA and AQUA-TMA were associated with improved response, and this effect was independent of concurrent treatment (Table 2). Although FISH-WS was the most sensitive assay, it was not significantly different in this measure from AQUA-TMA, and it was also the least specific. Immunohistochemistry-TMA was the most specific, but was not significantly different from AQUA-TMA. This result suggests that HER2 protein measurement is a better predictor of trastuzumab response than measurement of gene amplification by FISH. This conclusion is in contrast to a previous study by Mass et al,11 which concluded that assessment of ERBB2 amplification by FISH was superior to IHC to predict trastuzumab response in a retrospective analysis of 3 clinical trials.
Translation of the Mass et al11 study conclusions to current testing practices is problematic. First, the IHC pharmacodiagnostic assay (the clinical trials assay) used for clinical trial enrollment was the monoclonal parent antibody of trastuzumab, 4D5, whereas the antibody in the widely used HercepTest kit, A0485, is polyclonal, directed against an intracellular epitope, and shows excellent agreement with gene amplification in well-controlled investigations. 45,46 Second, IHC scores of both 2+ and 3+ were considered positive, reflecting the scoring standards at the time of the clinical trials, and no subset analysis of only 3+ cases was performed. Finally, all patients with positive IHC scores were accepted from local laboratory results without central confirmation, whereas sample processing and scoring for FISH was performed in a central laboratory by the authors. We propose that the IHC+/ FISH- cases in the Mass et al study (107/451; 23.7%) likely represent false-positive IHC scores, and underscore the need for IHC standardization rather than the superiority of the FISH assay. The bimodal distribution of FISH signal ratios observed in the Mass et al study supports this conclusion and mirrors the bimodal AQUA score distribution observed in the current study. In previously published work, automated scoring of IHC showed high concordance with FISH positivity,40 and different methods for automated analysis of protein expression have recently been reviewed.47
This study was designed to test the performance of a well-controlled, automated quantitative technique for HER2 testing against current clinical pharmacodiagnostic assays for HER2. In this cohort, the AQUA-TMA logistic model for response prediction was superior to FISH and IHC-WS and comparable to IHC-TMA. The similar strength of correlation between IHC-TMA and AQUATMA scores and outcome further emphasizes the need for accurate score interpretation, either in a central, high-volume laboratory or by a quantitative method. AQUA scores were more sensitive but less specific than IHC and comparable to FISH in both sensitivity and specificity. In addition, the misclassification rate of AQUA was 0.30, less than all of the other HER2 testing methods. Using the AQUA assay, 54% of the women that scored positive (AQUA > 20) responded to therapy, whereas 75% of the women that scored negative (AQUA ? 20) did not respond to therapy. In fact, even pairwise combinations of the FISH and IHC (a virtual "reflex" assay) did not improve misclassification rates compared with AQUA (data not shown). An advantage of AQUA over both IHC and FISH is the objectivity of the automated scoring, which minimizes user error.
Immunofluorescent staining for measurement of AQUA scores is derived from the HercepTest IHC technique, and therefore shares many of its limitations, such as variability introduced by differences in fixation, reagents, and antigen retrieval methods. However, results that are skewed due to staining variability can be quickly identified using a set of carefully chosen paraffin-embedded cell line extrinsic controls.26 The quantitative measurement of such controls allows for the detection and normalization of data variability.
The problem of accurately selecting patients who will respond to therapy is the heart of the concept of companion diagnostics. This work updates outdated comparisons of IHC and FISH to trastuzumab outcome by adhering to testing practices recently published in this journal as the ASCO/CAP guidelines.17 The use of current assays has increased the drug response rate into the 30% to 50% range. Although that is a dramatic improvement over the historic approach without companion diagnostics, new assays are likely to substantially improve this paradigm. We envision translation of these or similar biologically based pathway assessment tests into the clinical sphere to better select patients for targeted therapies.
This work was supported by an Avon-National Cancer Institute (NCI) Progress for Patients grant and NCI grants R33 CA 106709 and R33 CA 110511 (Dr Rimm), National Institutes of Health MSTP TG 5T32GM07205 grant (Ms Giltnane), and NCI grant K22CA123146 (Dr Molinaro). Thanks to Robert Camp, MD, PhD, for assistance with AQUA, and Nita Maihle, PhD, for the generous contribution of transfected cell line controls. Thanks also to the Yale University Life Sciences High Performance Computing Center, supported by the National Center for Research Resources High End Shared Instrumentation grant S10 RR19895.
© 2008 College of American Pathologists Provided by ProQuest LLC. All Rights Reserved.

Copyright 2008 Archives of Pathology & Laboratory Medicine