Text Summarization as Controlled Search

Terry COPECK
School of Information Technology & Engineering
University of Ottawa
150 Louis Pasteur, P.O. Box 450 Stn. A
Ottawa, Ontario, Canada K1N 6N5
/ Nathalie JAPKOWICZ
School of Information Technology & Engineering
University of Ottawa
150 Louis Pasteur, P.O. Box 450 Stn. A
Ottawa, Ontario, Canada K1N 6N5

Stan SZPAKOWICZ
School of Information Technology & Engineering
University of Ottawa
150 Louis Pasteur, P.O. Box 450 Stn. A
Ottawa, Ontario, Canada K1N 6N5

Paper ID: NAACL-2001-XXXX

Keywords: machine learning, automatic evaluation

Contact Author: Terry Copeck

Under consideration for other conferences (specify)? no

Abstract

We present a framework for text summarization based on the generate-and-test model. A large set of summaries, generated by enumerating all plausible values of six parameters, was assessed against the abstract included in the document. The number involved dictated the use of automated means, supervised machine learning, to make this assessment. Results suggest the method of key phrase extraction is the most, and possibly only, important variable.

Theme Area: Evaluation and Text/Training Corpora

Wordcount: 3,800

Text Summarization as Controlled Search

Abstract

We present a framework for text summarization based on the generate-and-test model. A large set of summaries was generated by enumerating all plausible values of six parameters. Quality was assessed by measuring them against the abstract included in the summarized document. The large number of summaries involved dictated the use of automated means to make this assessment; we chose to use supervised machine learning. Results suggest that the method of key phrase extraction is the most, and possibly the only, important variable. Machine learning identified parameters and ranges of values within which the summary generator might be used with diminished reliability to summarize new texts for which no author's abstract exists.

Introduction

Text summarization consists in compressing a document into a smaller précis of the information contained in the document. Its goal is to include the most important facts in that précis. Summarization can be achieved by constructing a new document from elements not necessarily present in the original. Alternatively, it can be seen as search for the most appropriate elements, usually sentences, that should be extracted for inclusion in the summary.

This paper takes the second viewpoint andapproach. It describes a partially automated search engine for summary building. This The engine is composed of two components: a Summary Generator, a which generator of summaries taking takes as input a large text and a number of parameters designed tothat tune the generator generation in a variety of ways and produces one or more summaries of the text, and a Generation Controller, designed towhich evaluates the summary output in order to select determine the best parameters to for use in g the generatoreneration.

This paper takes the second approach. It describes a partially automated search engine for summary building. The engine has two components. The Summary Generator takes as input a text and a number of parameters that tune generation in a variety of ways, and produces onea single or more summaryies of the text. While each set of parameter values is unique, it is possible that two or more sets will produce the same summary. The Generation Controller evaluates the summary output in order to determine the best parameters for use in generation.

Summary generation requires many parameter settings because it is designed to be a testbed, in that a number of publicly-available text processing programs can be used to perform each of three main stages in its operation. Simply selecting among the available programs produces 18 possible input combinations; given that many of these programs accept or require additional parameterization themselves, and evaluating the possible set of inputs in any manual way becomes impossible.

Summary generation is achieved by segmenting the text, extracting keyphrases from it, and selecting text segments (made up of several contiguous sentences) containing a certain number of keyphrases from them. The selected segments are then compiled (in their order of extraction) into a summary. It was decided to segment the text into fragments rather than simply picking single best sentences because it was believed that a summary made of juxtaposed segments would flow better than one made of juxtaposed, non-contiguous sentences. The possibility to choose, as parameters to the generator, several segmenters, keyphrase extractors as well as several conditions on the minimum number of keyphrases necessary to select a given segment, maximum numbers of sentences included in the summary, and so on is what allows the generator to output different summaries when instantiated in different ways.

Summary generation has been designed to combine a number of publicly-available text processing programs to perform each of three main stages in its operation. Many parameter settings are possible. Just selecting among the available programs produces 18 possible input combinations; given that many of these programs accept or require additional parameterization themselves, evaluating the possible set of inputs manually becomes impossible.

Generation control is achieved by using a supervised machine learning approach to extract the parameters and values yielding the best summaries. The training data consists of the parameter settings used to generate each summary along with a rating of that summary. The rating was obtained by comparingpitting the automatically automatically-generated summary against a corresponding one written by a personhand-written summary which, which is was taken is taken to be our the golden standard for the document; in our case it is the author’s abstract. The control process is iterative in the sense that once bad parameter values are identified, the controller is applied to the good remaining parameter values remaining in a furthern attempt to refine our the choice of parameters.

Generation control is achieved by supervised machine learning that extractsassociates the parameters and valuesof the summary generation process with expressions of the generated summary’s quality. which yield the best summaries. The training data consist of the parameter settings used to generate each summary, along with a rating of that summary. The rating is obtained by comparing automatically generated summaries against the author’s abstract, which we treat (with some justification) as the “gold standard” for the document on the grounds that no one is better able to summarize a document than its author. The control process is iterative: once bad parameter values have been identified, the controller is applied to the remaining good values in a further attempt to refine the choice of parameters.

While the summary generator is currently fully automated, at this time, fully automated, the controller was applied in a semi-manual way. We are, however, planning on automating plan to automate the entire system in the near future.

The remainder of this paper is organized in four sections. Section 1 2 describes the overall architecture of our system. Sections 2 3 and 4 describepresents its two main components, the Summary Generator and the Generation Controller, in greater detail, Section 3 describes the Generation controller., Section 4 5 presents our results, discussinges which the parameters or and combination of parameters the controller identified as good or bad, and . Section 5 6 describes the refinements we are planning to introduce.

1  Overall Architecture bring make to this system in future..

The overall architecture of the system is shown in Figure 1. Its two main components are the Summary Generator and the Generation Controller. The Summary Generator takes as inputs the text to be summarized and a certain number of parameter values. It outputs one or more summaries. These are then fed into the Generation Controller along with the parameter values used to produce them. The Generation Controller has two subcomponents. The Summary Assessor takes the input summary along with a “gold standard” summary—the abstract written by the document’s author—and issues a rating of the summary with respect to the abstract. The Parameter Assessor takes an accumulated set of parameter values used to generate summaries together with their corresponding ratings and outputs the set of best and worst combinations of parameters and values with respect to the ratings. At present this is the point at which automation stops. We plan to implement an iterative feedback loop that would feed the output of the Parameter Assessor into the parameter values input to both the Summary Generator and Generation Component several times in order to refine the best possible parameter values to control summarization.

2 The Summary Generator

The Summary Generator was originally designed to investigate the hypothesis is based on the assumption that a summary composed of adjacent, thematically related sentences will convey more information than one in which each sentence is independently chosen from the document as a whole and, as a result, likely bearing no relation to its neighbours. This hypothesis assumption dictated both an abstract methodology and a design architecture for this component: one of our objectives is to validate this intuitive but unproven restriction of summary extraction.

Let us first discuss the methodology. A text to be summarized is initially a) broken down into a series of sequences of sentences or segments which talk about the same topic. The set of segments is then rated in some way. We do this by b) identifying the most important key phrases in the document and c) ranking each segment on its overall number of matches for these key phrases. The A requested number of sentences is then selected from one or more of the top-ranked segments and shown to the user in document order as the summary.

Because one of our objectives was to investigate the policy of taking a number of sentences from within one or more segments rather than those individually rated best, we wanted to control for other factors in the summarization process. To accomplish this we decided to design a configurable system with alternative modules for the component as a testbed in which various programs could be substituted for one another in performing each of the three main tasks identified above; and where possible to use for these tasks programs which are well-known to the research community because they are in the public domain or are made freely available by their authors. In this way we hoped to be able to investigate our hypothesis evaluate our assumption effectively.

Three text segmenters were adopted: the original Segmenter from Columbia University (Kan et al. 1998)(Min-Yen Kan), Marti Hearst's (1997) TextTiling program, and the C99 program (Choi 2000)released in 1999 by Freddy Choi. Three more programs were used to pick out keywords and key phrases from the body of a text: Waikato University's Kea (Witten et al.. 1999), Extractor from the Canadian NRC's Institute for Information Technology Science(Turney 2000), and NPSeek, a program developed locally at the University of Ottawa.

Once a list of key phrases has been assembled, the next step is to count the number of matches found in each segment. The obvious way is to find exact matches for a key in the text. However there are other ways, and arguments for using them. To illustrate, consider the word total. Instances in text include total up the bill, the total weight and compute a grand total., While different parts of speech, each of these homonyms is an exact match for the key. What about totally, totals and totalled? Should they be considered to match total? If the answer is yes, what then should be done when the key rather than the match contains a suffix? Counting hits based on agreement between the root of a keyword and the root of a match is called stem matching. The Kea program suite contains modules which perform both exact and stem matching, and each these modules was repackaged as a standalone programs to perform this task in our program. The architecture of the summarization component appears in Figure 2, which shows graphically that independent of other parameter settings, eighteen different combinations of various programs can be invoked to summarize a text.

The particular segmenter, keyphrase extractor (keyphrase) and type of matching (matchtype) to use are three of the parameters required to generate a summary. Three others must also be specified: the number of sentences to include in the summary (sentcount), the number of keyphrases to use in ranking sentences and segments (keycount), and the minimum number of matches in a segment (hitcount). The last parameter is used to test the hypothesis that a summary composed of adjacent, thematically related sentences will be better than one made from individually selected sentences. The Summary Generator constructs a summary by taking the most highly rated sentences from the segment with the greatest average number of keyphrases, so long as each sentence has at least the number of instances of keyphrases specified by hitcount. When this is no longer true the generator takes sentences from the next most highly rated segment until the required number of sentences has been accumulated.++++++ To be described by Terry/Stan +++++++++++++++++

3 The Generation Controller

3.1 The Summary Assessment Component

Summary assessment requires establishing how effectively a summary represents the original document. This task is generally considered hard (Goldstein et al. 1999). First of all, assessment is highly subjective: different people are very likely to rate a summary in different ways. Summary assessment is also goal-dependent: how will the summary be used? Much more knowledge of a domain can be presumed in a specialized audience than in the general public, and an effective summary must reflect this. Finally, and perhaps most important when many summaries are produced automatically, assessment is very labour-intensive: the assessor needs to read the full document and all summaries. It is difficult to recruit volunteers for such pursuits, and very costly to hire assessors.

Taking these points into consideration, we decided judged automatic summary assessment to be of paramount importance.to automate the summary assessment procedureAnd while we succeeded in our goal of automating the procedure, certain necessary assumptions are debatable (is an abstract really the gold standard for a summary?) and there is more subjectivity in the design than we would like. We . Although we designed a computerized method for doing so, a certain amount of subjectivity went in this design and we are planning on studying our the automated procedure carefully and improving it in the future.