AN ENHANCED RECOMMENDER SYSTEM FOR QUERIED DATA USING LOCATION BASED KNN AND COLLABRATIVE FILTERING METHOD

ABSTRACT

Comparing one thing with based on recommended ratings is a typical part of human decision making process. However, it is not always easy to know what to recommend, how to recommend and what are the nearby alternatives. To address this difficulty, represent a novel way to automatically mine the comparable entities from filtering the recommendations that users posted online. To ensure the high quality of filtering and most preferable recommendations are achieved by developing the K-nearest neighbour method and collaborative filtering. These Algorithms extract the recommendations and quality items with its details. The experimental results show by the method achieves preference for the item which is searched and gives the location based extraction for the preferable entity. The proposed approach can evaluate and simulate the experiment with a java based recommendation queries and e-rate scenario. The approach for filtering process in extracting database from different database servers can be easily integrated and deliver highly preferred items for relevant searching which is improved by the Top-k method of item classifications.

1.1 INTRODUCTION

Recommender system is a software application agent that gets the choices, interest and preferences of individual persons/ users and makes recommendation accordingly. During online search proposed system provide easier method for users to make decisions based on their recommendations. Majority of existing approaches of recommender system focus on factors like individual persons choice of item, group community opinions and do not consider importance like contextual information like location of user from big data. This is achieved using the technique named as collaborative filtering (CF) [3] which is based on past group community opinions for user and item and correlates them to provide results to the user questions and queries. Most of the community opinions were a group of tuple i.e. (user, ratings, and item) of user and numeric rating for item. Content-based recommendation recommends items to users that are similar to those they preferred previously. The analysis of similarity is based on the items attributes. Collaborative recommendation recommends items to users according to the item ratings of other people who have characteristics similar to their own. The analysis of similarity is based on the user’s tastes and preferences. Hybrid recommendation is a combination of content-based and collaborative recommendations. ERQD* is an enhanced recommender system for queried data which gives ratings based on the location. It categories by required item and mapping the location for that item, this can be evaluate by the three ways i.e. (a) spatial ratings for non spatial items, (b) non-spatial ratings for spatial items and (c) spatial ratings for spatial items which are depicted by the corresponding user location and items location or may be both on the same mapping done by the tuple.

Filling these simple search boxes with different relevant terms may return the same set of documents from the database. Posing all of them would not be a good retrieval strategy as it yields the same set over and over again. Intuitively, posing some optimal set of terms that can retrieve a diversified document set covering the entire hidden database is more desirable than posing the entire set of terms that retrieves and presents the same set of documents repeatedly. Thus, it becomes necessary not only to rank the terms but also select an optimal set from the ranked terms. Our work focuses on how to formulate appropriate query terms for text box based simple search form interfaces that can yield maximum number of documents from the database by posing a minimum number of such terms sequentially on the interface. This paper illustrates the recommendation system with better quality, accuracy as compared to existing recommendations system.

1.1.1  Data Mining

Data

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:

·  operational or transactional data such as, sales, cost, inventory, payroll, and accounting

·  nonoperational data, such as industry sales, forecast data, and macro economic data

·  meta data - data about the data itself, such as logical database design or data dictionary definitions

Information

The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.

Knowledge

Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

Data Warehouses

Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

Data mining

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD, an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection and data preparation, nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps.

Figure No. 1.1 Knowledge Discovery in Database

1.1.2  Techniques used under data mining

1.1.2.1  Collaborative Filtering(CF)

Collaborative filtering (CF) is a technique used by some recommender systems. Collaborative filtering has two senses, a narrow one and a more general one. In general, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc.

In the newer, narrower sense, collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a person chosen randomly. For example, a collaborative filtering recommendation system for television tastes could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes). Note that these predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an average (non-specific) score for each item of interest, for example based on its number of votes.

Collaborative filtering systems have many forms, but many common systems can be reduced to two steps:

1.  Look for users who share the same rating patterns with the active user (the user whom the prediction is for).

2.  Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

Figure No. 1.2 Collaborative filtering recommendation

1.1.2.2  K – Nearest Neighborhood (k-NN)

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

·  In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k=1, then the object is simply assigned to the class of that single nearest neighbor.

·  In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.

Both for classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.

Figure No. 1.3 k – NN based Location finding

A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.
Some distance finding functions like, Euclidean √i=1k(xi-yi)2 , Manhattan i=1k| xi-yi | .
It should also be noted that all two distance measures are only valid for continuous variables. In the instance of categorical variables the Hamming distance must be used.It also brings up the issue of standardization of the numerical variables between 0 and 1 when there is a mixture of numerical and categorical variables in the dataset.

1.1.3  Overview

In terms of discovering related items with preferred attributes to enhance the property, the system is similar to the research on comparison systems, which compare the items to user. Recommender systems mainly rely on similarities between items and/or their statistical correlations in user log data. For example, Amazon recommends products to its customers based on their own purchase histories; similar customers purchase histories, and similarity between products. However, recommending an item is not equivalent to finding a comparable item. In the case of Amazon, the purpose of recommendation is to entice their customers to add more items to their shopping karts by suggesting similar or related items. If in this suggestion given by comparison, the system would like to help users with comparable items, but using the preference on the suggestion the customer get quality of items and also get the place value for the items. For example, it is reasonable to recommend “iPhone”or “Micromax” if a user is interested in the “iPhone” user clearly get the preference rating of that product and also get the recommends from the location based other users.

1.1.4  Objective

The system is going to conduct a simulation experiment and evaluate the proposed approach with a java based recommendation queries and e-rate scenario. The approach for filtering process in extraction database from different database servers can be easily integrated and deliver highly preferred items for relevant searching which is to be mature by the Top-k method of item classifications. However, the rating with its research mechanism includes 1) improving the proposed evolution approach by making the collaborative filtering and 2) applying the proposed approach to support the location based recommendations.

1.2 Literature Review

1.2.1 Amazon.com recommendations: Item-to-item collaborative filtering

Authors: G. Linden, B. Smith, and J. York

Amazon makes heavy use of an item-to-item collaborative filtering approach. This essentially means that for each item X, Amazon builds a neighborhood of related items S(X); whenever one who buy/look at an item, Amazon then recommends one’s items from that item's neighborhood. That's why when one sign in to Amazon and look at the front page, the recommendations are mostly of the form “User viewed... Customers who viewed this also viewed...” This item-to-item approach can be contrasted to, two different approach where using to implement the filtering i.e. user-to-user collaborative filtering approach and global factorization approach.