Fuzzy Set Theoretic Framework for Handling of Subjectivity and Associated Uncertainty in Knowledge Representation

Dr. Azene Zenebe and Dr. David Anyiwo

Department of Management Information Systems

Bowie State University

Bowie, Maryland 20715

Abstract

In cybernetics reality is viewed as an interactive conception between the observer and the observed. Consequently, subjectivity is omnipresent during any attempt for constructing ‘meaning and understanding’. This paper introduces the relationships among cybernetics, fuzzy set and uncertainty. It then presents fuzzy set theoretic framework for representing subjectivity and the associated uncertainty in knowledge modeling and representation. The framework is applied to movie objects representation and reasoning in movie recommender systems. Related recent research and development efforts are also discussed.

1.  Introduction: Cybernetics and Fuzzy Set Theory

Originally, cybernetics was defined in 1947 by Wiener as ‘the science of communication and control, and grew out of Shannon's information theory, which was designed to optimize the transfer of information through communication channels, and the feedback concept used in engineering control systems.’ With advancement of computer technology, various other fields are related to cybernetics, including knowledge elicitation and representation. Knowledge representation tools for intelligence and modeling use computer programs that reorganize models in order to make knowledge representation more adequate, correct, and semantically rich.

The cybernetic approach is centrally concerned with this unavoidable limitation of what we can know: our own subjectivity. That is, cybernetics' epistemological stand is that all human knowing is constrained by our perceptions and our beliefs, and hence is subjective. As a result, cybernetics has directly affected software for intelligent training, knowledge representation, cognitive modeling, computer-supported cooperative work, and neural modeling.

Support for cybernetics to be engaged with subjectivity comes from the fuzzy set theory. In fuzzy set theory, a set is a collection of objects such that each object has an associated membership degree usually between zero and one as a subjective evaluation or a degree of truth. As natural extension of fuzzy set, fuzzy logic provides a means to model partial truth by extending the Aristotelian two-valued logic to infinite-valued logic. Moreover, fuzzy set theory also supports the acceptance of uncertainty associated with subjectivity, imprecision and vagueness in the beholder of the observer about reality. Negoita (2002) stated:

“Cybernetics is defined by control. If physics is the science of understanding the physical environment, then control should be viewed as the science of modifying the environment, in physical, biological, or even social sense. Bringing cybernetics to the streets, the fuzzy sets wittingly helped to accelerate the shift from the Enlightenment ideal of the two-valued logic, to postmodern preoccupation with many degrees of truth.” (Negoita 2002).

Pascal and Fermat tackled uncertainty through probability theory from 17th century. However, this theory neither allows subjective belief to be dealt with nor allows the problem of imprecise and uncertain knowledge to be solved. In the 1980s, it was realized that uncertainty is a multidimensional concept. The dimensions are related to the different categories of uncertainty that exist in a system model, including uncertainty due to randomness, uncertainty due to ambiguity and uncertainty due to vagueness and imprecision. Hence, uncertainty is obscured when it was conceived solely in terms of probability theory that covers only one of its dimensions.

The different dimensions of uncertainty call for appropriate mathematical formalisms including evidence theory, fuzzy set theory (Zadeh 1965) and possibility theory (Zadeh 1978) that complement the probability theory. One of the dimensions of uncertainty is uncertainty due to imprecision and vagueness that is identified with lack of sharp or precise distinctions in the world. It is also referred to as fuzziness.

Uncertainty has a significant role in improving the usefulness of systems model. Ignoring uncertainty in modeling and during inference by a computational system has its own consequences since decisions are made in situations where uncertainty exists. Klir and Wierman (Klir and Wierman 1999) explain the significant role of uncertainty as follows:

“ … Uncertainty becomes very valuable when considered in connection to the other characteristics of systems models: a slight increase in uncertainty may often significantly reduce complexity and; at the same time it increases, credibility of the model. Uncertainty is thus an important commodity in the modeling business, a commodity which can be traded for gains in the other essential characteristics of models.” (Klir and Wierman, pp. 4).

This paper is organized into five sections. Section 1 introduces the relationships among cybernetics, fuzzy set and uncertainty. Section 2 presents the fuzzy set formal definitions and the framework developed for handling subjectivity and associated uncertainty in knowledge representation. Section 3 presents the application of the framework to Ontology and the Semantic Web; and Section 4 presents the utility of the knowledge model for personalized recommendation for movies. Finally, Section 5 presents the conclusion and future research directions.

2.  Fuzzy Set Theoretic Framework for Representing Subjectivity and Associated Uncertainty

2.1.  Fuzzy Set Theory

Fuzzy set theory consists of mathematical approaches that are flexible and well-suited for handling incomplete information, the un-sharpness of classes of objects or situations, or the gradualness of preference profile (Dubois and Prade 2000). Its building blocks are fuzzy sets (membership functions), aggregation operators, techniques of measurement of membership, similarity and fuzzy orderings, fuzzy relations equations, fuzzy number and intervals, fuzzy interval-valued analysis, and approximate reasoning and fuzzy rules (Zimmermann 1996; Pedrycz and Gomide 1998; Dubois and Prade 2000).

A fuzzy set B in X is characterized by its membership, denoted by , and defined as (Zadeh 1965):

where, X is a domain space or universe of discourse. Also, B can be characterized by the set of pairs,

is the grade of membership of x in B having different interpretation depending on the context in which X is used and the concept to be represented. Dubois and et al. (Dubois, Ostasiewicz et al. 2000), and Bilgiç and Turksen(Bilgiç and Turksen 2000) present a review of various interpretations of the fuzzy membership function together with techniques for elicitation of a membership function. The two relevant interpretations are:

·  degree of similarity - represents the proximity between pieces of information. For example, membership grade of a user's movie interest to the fuzzy set of "Drama movies lover" can be estimated by degree of similarity.

·  degree of uncertainty or truth - can be viewed as the degree of plausibility that X has value x, given that all that is known about it is that "X is A", where A is a fuzzy set. (Zadeh, 1978 #2044).

Furthermore, the type of membership function that is suitable can only be determined in the application context, however in certain cases the meaning captured by fuzzy sets is not too sensitive to the variations in the shape (Pedrycz and Gomide 1998). In practice, triangular, trapezoid, Gaussian function, S-function, and exponential-like function are the most commonly used membership functions.

There are various fuzzy set operators as a substitute or extension of the crisp set operators. For fuzzy set A and B in X, the triangular norm (t-norm) and a triangular co-norm (t-conorm or s-norm) are the general classes of intersection and union operators (Pedrycz and Gomide 1998). The max operator is defined as Max (A or B) = maximum{,}, and min operator defined as Min (A or B) = minimum{, } for x in X.

Fuzzy set and logic provide a way to quantify the uncertainty due to vagueness and imprecision; allow computers to process abstract or subjective concepts that are represented with linguistic variables like very rich/expensive; and can universally model a complex system without the need to know the underlying governing mathematical equations (Zadeh 1994). For a symbolic variable (a variable with symbolic values), uncertainty can be represented in terms of qualitative expressions or by using fuzzy sets with a corresponding membership function. Examples of symbolic variables are preference, genre content of a movie and degree of role of an actress in a movie. Therefore, fuzzy systems are capable of processing incomplete and imprecise data, and provide approximate, but acceptable, solutions to problems that are difficult for other traditional methods to solve. (O’Brien, James A., 2002).

2.2.  Knowledge Representation using Fuzzy Set for Movies

Movie genres describe the content of movies, and movies are multi-genres (Altman 1999). It is inappropriate to treat all genres equally as some genres may be more significant than others. An analysis of the descriptions of the main film genres shows that movies of genre g1 (e.g. action) and movies of genre g2 (e.g. adventure) share common subject matter and other movie’s attributes (Staiger 1997). Hence, it is sometimes difficult to judge whether a movie belongs completely to a genre or not. As a result, it induces uncertainty in the determination of the genres distribution of a movie. Fuzzy set allows us to represent the uncertainty data.

With the definition of a movie in space of genres (Table 1), a movie has one major genre denoted by g1 and other minor genres g2, g3, etc. in the decreasing order of degree of genre presence in the movie. For a given vector G = {gk, k = 1 …N}, where N is total number of genres in a movie, the corresponding degree of membership of a movie mj to a genre gk in G is denoted by gjk = . Hence, for , Gj={( gk, gjk), k= 1… N}, where gjk can be obtained either heuristically from domain experts or empirically from the data.

Table 1: A user’s ratings of m-movies for a user, and genre distribution of a movie

To determine the degree of genres presence in movies, the following two steps are followed:

Step 1: Arrange gk in order of descending magnitude to the movie under consideration. In IMDB, movie’s genres are presented in their order of significance [IMDB sites/ documentation]. For example, movie ‘BOOTMEN’ has Comedy as a major and Drama as a minor genre, which is stated as:

Step 2: Assign higher degrees of membership or compatibility value to more important genres of a movie. For instance,

If mj has only one genre, then gj1 = 1 and gjk = 0 for all k = 2 to N.

If mj has two genres, then gj1 = 0.8, gj2 = 0.2 and gjk = 0 for all k = 2 to N.

If mj has three genres, then gj1 = 0.70 and gj2 = 0.30, gj3 = 0.10 and gjk = 0 for all k = 2 to N.

and so on

We propose to represent the generalization of this type of heuristic rules gjk using a fuzzy set membership function. In particular, it is represented as a function of the number of genres (|Lj|) in a movie mj and rank position (p) of a genre using a decreasing and smoothing exponential function (Figure 1), defined as:

gjk = (2)

for p between 1 and |Lj|, and > 1 is the threshold to differentiate/optimize the difference between consecutive genres in a movie. After a number of trials, is assigned a value of 1.2.

Two examples that use the exponential membership function are presented next.

(i) For movie ‘BOOTMEN’: Comedy/Drama: L=2 and G ={( Comedy,1), (Drama, 0.68)}.

(ii) For movie ‘Muppet Treasure Island (1996)’: Family / Action / Adventure / Comedy / Musical / Thriller: Lj = 6 and G = {( Family, 1), (Action, 0.31), (Adventure, 0.22), (Comedy, 0.16), (Musical, 0.12), (Thriller, 0.09)}.

The membership function in equation (2) considers the total number of distinct genres in a movie, which leads to varying degree of membership values for the same genre at same rank positions among movies with different number of genres. It also results in a normalized fuzzy set representation of a movie in the genre space, where the maximum membership value is 1. Which means represents the degree of similarity of a movie mj to a hypothetical or prototype pure gk type movie.

Figure 1: Possibility distribution of genres in a movie

Similarly, the actors in a movie can be represented in a vector A = { a1, a2, … ak} for k actors. The degree of role or importance of an actor ak in a movie mi can be represented by degree of membership associated with the fuzzy set degree of role or importance. That is, Aj = {( ak, ), for k=1 to K}. Similar to the membership function defined for genres, can be defined as

akj = (3)

Where, it is represented as a function of the number of actresses (|Aj|) in a movie mj and rank position/role (p) of an actor between 1 and K=|Aj|, and > 1 is the threshold to differentiate/optimize the degree of role among actors in a movie.

The representation scheme can be generalized and applied to an item (I) with multi-valued feature X with overlapping or non-mutual exclusive possible values, like a movie having one or more genres, actors/actresses, etc. That is, for X={x1, x2, x3, …. xk}, represents the membership degree of an item Ij to the hypothetical pure item with value type xk of feature X. For example, for a book the features can be topic, author; and for music the features can be music genre, and band members.

3.  Application in Ontology and the Semantic Web

Ontology is ‘a controlled vocabulary that describes objects and the relations between them in a formal way, and has a grammar for using the vocabulary terms to express something meaningful within a specified domain of interest.’ It uses classes to represent concepts, and supports taxonomy and non-taxonomy relations between classes. Current Ontologies for Movies do not support the representation of subjectivity and associated uncertainty information in movies classes and attributes. For instance, it is inappropriate to treat all genres, actors/actresses, etc. equally as some genres and actors/actresses may be more significant than others. Also, it is sometimes difficult to judge whether a movie completely or to some extent belongs to a genre or not.

There are various taxonomy and non-taxonomy relations for movies application domain. For instance, Figure 2 reveals the generic labels employed by film reviewers in the television listings magazines in the British What’s On TV over several months in 1993.