In customer relationship management (CRM), Web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. (Miningmeans extracting something useful or valuable from a baser substance, such as mining gold from the earth.) Web mining is used to understand customer behavior, evaluate the effectiveness of a particular Web site, and help quantify the success of a marketing campaign.
Web mining allows you to look for patterns in data through content mining, structure mining, and usage mining. Content mining is used to examine data collected by search engines and Webspiders. Structure mining is used to examine data related to the structure of a particular Web site and usage mining is used to examine data related to a particular user's browser as well as data gathered by forms the user may have submitted during Web transactions.
The information gathered through Web mining is evaluated (sometimes with the aid of software graphing applications) by using traditionaldata miningparameters such as clustering and classification, association, and examination of sequential patterns.
Web Mining
is a collection of inter-related files on one or more
Web servers.
Web mining is
The application of data mining techniques to extract knowledge ation Web data.
Web data is
Web content-text, image, records, etc
Web structure-hyperlinks, tags, etc.
Web usage –
http logs, app server logs, etc.
Web Mining –history
-Term first used in [E1996], defined in a task oriented manner
-Alternate ‘data oriented’ de z
1 z
st
1997 [SM1997] ICTAI panel discussion at
Continuing forum z
WebKDD workshops with ACM SIGKDD, 1999, 2000, 2001, z
0 attendees 9 2002, … ; 60 –
shop 2001, 2002, … SIAM Web analytics work z
Special issues of DMKD journal, SIGKDD Explorations z
Papers in various data mining conferences & journals z
Surveys[ MBNL 1999, BL 1999, KB2000]
Pre-processing Web Data
-Web Content
Extract “snippets” from a Web document that
represents the Web Document
-Web Structure
Identifying interesting graph patterns or pre-
processing the whole web graph to come up with
metrics such as PageRank
-Web Usage
User identification, session creation, robot detection
and filtering, and extracting usage path patterns
Common Mining Techniques
The more basic and popular data mining
techniques include:
Classification
Clustering
Associations
The other significant ideas:
Topic Identification, tracking and drift analysis
Concept hierarchy creation
Relevance of content.
Web Content Mining Applications
Identify the topics represented by a Web Documents
Categorize Web Documents
Find Web Pages across different servers that are similar
Applications related to relevance
nhance standard Query Relevance with User, E Queries
Role, and/or Task Based Relevance
ist of top “n” relevant documents in L Recommendations
a collection or portion of a collection.
Filters-show/Hide documents based on relevance score.
What is Web Usage Mining?
A web is a collection of inter-related files on one or more Web
Web servers
Web Usage Mining
Discovery of meaningful patterns from data generated by
client-server transactions on one or more Web localities
Typical Sources of Data
access automatically generated data stored in server
cookies logs, and client-side agent logs, referrer logs,
user profiles
meta data: page attributes, content attributes, usage data
Conclusions
Web Structure is a useful source for extracting
information such as
Quality of Web Page
The authority of a page on a topic -
Ranking of web pages -
Interesting Web Structures Graph patterns like Co-citation, Social choice, -
Complete bipartite graphs, etc.
Web Page Classification
Classifying web pages according to various topic
Which pages to crawl
Deciding which web pages to add to the collection of -
web pages
Finding Related Pages Given one relevant page, find all related pages -
Detection of duplicated pages
Detection of neared-mirror sites to eliminate duplication.