A Mechanism for Classifying and Preventing Phishing Websites

Master Project Report

PhishLurk:

A Mechanism for Classifying and Preventing Phishing Websites

By: Mohammed Alqahtani

1. Committee Members and Signatures:

Approved by Date

______

Advisor: Dr. Edward Chow

______

Committee member: Dr. Albert Glock

______

Committee member: Dr. Chuan Yue

Abstract

Phishing attackers have been improving and sophisticating their attempts using different ways and methods to target users. At the same time, users are using varieties ways to access the internet with different platforms, different computation capabilities and various level of protection support which expands the surface for phishing attackers and complicates the provisioning of security protection.

I proposed PhishLurk, an anti-phishing search website that classifies and prevents phishing attacks. PhishLurk provides the protection from the server side and uses the coloring scheme and warning for classification in order to consume as little computation and screen resource as possible on the client-side. It can work efficiently with varieties of devices having different capabilities. PhishLurk uses PhishTank as the blacklist provider and checks the list in real time to achieve the maximum possible accuracy. The idea of PhishLurk can be a useful enhancement, if it is adopted by major search engines, e.g., Google and Yahoo. Besides the mechanism can be optimized to apply and work efficiently for smartphones.

1. Introduction

Phishing is a cybercrime when an attacker tries to gather personal and financial information, such as usernames, passwords, and credit card numbers, from recipients by pretending to be a legitimate website. Most phishing attacks come into two types: emails and webpages that spoof or lure the user to enter sensitive information. On other words, phishing is directing users to fraudulent web sites in order to get the sensitive information. The sensitive information can be confidential information or financial data [22]. Figure 1 shows a sample of phishing website. Phishers used to utilize emails to lure the targets to give away some information. Lately, Phishers started to used different methods to lure and steal the targeted users’ information, Methods such as faked websites, trojans, key-loggers and screen captures [23].

Fiugre1: Sample of a phishing website (source: www.phishtank.com)

1.1 Impact of phishing

Phishing has been a major concern in the IT security. In the U.S., companies lose more than $2 billion every year as results of phishing attacks [6]. 1.2 million users in theU.S. were phished between May 2004 & May 2005 which approximately cost $929 million [6]. AOL-UK announced that one out of twenty users has lost money from phishing attacks [25]. In 2010 a survey indicates that generally between half a billion dollars to $1 trillion every year is the loss from cybercrime due to the loss of confidential banking information or corporate data [25].

2. Background

Recently, Users started to have more varieties of access to surf the internet for example notebooks, PC, game console, handhelds, and smartphones , However; using more varieties of devises made in different abilities and features make it complicate to provide a full protection, especially from phishing attacks . Currently there is no perfect protection. One of the most used devices is smartphones. According to a survey of ComScore, Inc. the number of smartphones subscribers increased 60 percent in 2010 compared to 2009 [4]. Another report by Nielsen Company indicates that by 2011 half of cell-phones users would be using smartphones [5]. Users prefer to use these types of access to do their activities and tasks due to the advantages they provide. Smartphone is preferred to use because of the easiness, flexibility, and mobility. Some activities such as online banking, paying bills, online shopping, emailing, and social networking[5] demand users to enter sensitive information to complete the authentication and authorization process. Sensitive information could be credit-cards numbers, password and usernames. In fact, having varieties of accesses to the internet expands the surface for phishing attackers and complicate the protection.

2.1 History of Phishing

The idea of luring people to give away their sensitive information simply started using the phone calls. Phishers used the combined phishing technique: making phone calls “Phreaking” and luring the target client “Fishing”. In mid-1990’s, the main target of phishing attackers was America Online (AOL). Phishers keep sending instant messages to users, using social engineering and similar domain names like www.ao1.com, to lure users to reveal their passwords. Then, utilize users’ account for free. Later attackers started seeking for more details and information such as credit card numbers and social security numbers. During the past ten years, Phishing attackers start attacking at a higher level and target financial service’ and online payment’ users directly such as E-buyers, PayPal, eBay and banks. In addition to the previous techniques, attackers used more advance techniques such as key-logging, browser vulnerabilities, and link obfuscation [27].

2.2 Most Targeted Industries

As result of the dense confidential content and financial use, the financial services and online payment are the most targeted industries by phishing attackers [22]. Figure 2 shows the distribution of the phishing activities by the targeted areas.

Figure 2. Phishing Activity Trends Report - 2nd Half 2010 - Anti-Phishing Working Group (APWG)

2.3 Why Phishing Works

Phishing works because of many reasons. One of the most common reasons is the users’ carelessness and ignorance about how to differentiate whether the website is legitimate or phishing [1]. Moreover, phishing attackers work hard by sending millions of messages and attempts, looking for vulnerabilities, and seeking for sensitive information.

2.4 Existing Work Anti-Phishing:

Many techniques have been proposed focusing on anti-phishing, using different methods of filtering and detection, such as black lists, plugs-in, extensions, and toolbars for browsers [2]. The developers of browsers try hard to provide a solid protection such as warning the user by displaying a box massage if the website is a potential phishing website, or contains invalid or expired SSL certificates. Often a third party and black-lists are involved to display and identify phishing websites [3].

3. Related Work

PhishTank is a nonprofit project aimed to build dependable database of phishing URLs [7]. The project is to collect, verify, track, and share phishing data. In order to report a phishing link, the user has to be registered as a member. So the admin can learn and judge each member's contribution. The phishing websites can be reported and submitted via emails or via PhishTank’s websites. The data are verified by a committee after they are submitted by the members. PhishTank’s database can be shared via an API. The links in the original database are only classified as “phishing” and “unknown”. We propose to classify the phishing links based on PhishTank database with a more precise modification. PhishTank has been working effectively to fight against phishing attacks, thousands of phishing links are detected and verified as valid phishing sites monthly [9]. It uses the public’s effort and contribution to build a trustworthy and dependable database that is open for everyone to use and share. As a result, several well-known organizations and browsers started using PhishTank database such as Yahoo mail, Opera, MacAfee, and Mozilla Firefox [10]. In my prototype, I use PhishTank as a phishing blacklist provider.

In the paper titled “Large-Scale Automatic Classification of Phishing Pages [2]”, Colin Whittaker, Brian Ryner, and Marria Nazif proposed an automatic classiﬁer to detect phishing websites. The classiﬁer maintains Google’s phishing blacklist automatically and analyzes millions of pages a day including examining the URL and the contents to verify whether the page is phishing or not. The paper proposed a classifier works automatically with large-scale system which will maintain a false positive rate below 0.1% and reduce the life time of phishing page. They used machine learning technique to analyze the web page content. In my project, the determination is based on Phishtank’s blacklist, My goal is not to determine whether the page phishing or not, but to provide a new method to classify phishing links and considering two factors: consuming as less memory and screen space as possible which eventually improve the overall classification efficiency.

In the paper titled “PhishGuard: A Browser Plug-in for Protection from Phishing [8], Joshi, Y. Saklikar, S. Das, D. Saha, proposed a mechanism to detect a forged website via submitting fake credentials before the actual credentials during the login process of a website, then the server-side analyzes the responses of the submissions of all those credentials to determine whether the website is phishing or not. The mechanism was implemented on browsers side “user-side” as plug-in of Mozilla Firefox, However; the mechanism only detects during the log-in process for a user. If another user log-in to the same phishing website, he will goes through the same detection process. In my project, if the website reported as phishing site, no other user can get access, the reported link will be blocked, to the reported website.

In the paper titled “BogusBiter: A Transparent Protection Against Phishing Attacks [17]” Chuan Yue and Haining Wang proposed a client-side tool called BogusBiter that send a large number of bogus credentials to suspected phishing sites and hides the real credentials from phishers . BogusBiter is unique and help legitimate web sites detect stolen credentials in a timely manner by having the phisher to verify the credentials he has collected at that legitimate web site. Bogus Biter was implanted as Firefox 2 extension. My project is different since it uses the server side to provide the protection.

In the paper titled “The Battle Against Phishing: Dynamic Security Skins [18]” Rachna Dhamija and J. D. Tygar proposed an anti-phishing tools helps user distinguishing if they are interacting with a trusted site or not by [1]. This approach uses shared cryptographic image that remote web servers use to proof their identities to users, in a way that supports easy veriﬁcation for humans being and hard for attackers to spoof/ It can’t provide protection when we have users utilizing a public access because the approach requires support from both client-sides and server-side. In my project there is no dependency on the client-side.

3.1 Blacklisting

Blacklisting is the idea of denying the access to resources based on a list. The blacklisting is determined either by a mechanism automatically e.g., Google’s blacklist [2] or by the users’ feedback as the case in PhishTank [7], where users submit and report the suspicious websites. The object of a blacklist can be a user, IP, website, or software.

We can classify varieties of blacklists as follows:

· Content filter: It is a proxy server to filter the content. The proxy server not only blocks banned URLs using blacklist but also use keywords, metadata, and pictures to filter the content. Examples of content filters include DansGuardian [28] and SquidGuard [Refs]. In SquidGuard, The proxy use advance web filtering polices to prevent inappropriate content for the organization or company. The filter blocks URLs using blacklist, controls the content by using the inferred keywords blocking from the metadata and the page content. SquidGuard are used mostly at educational environments and for kids’ protection. The main goal of content filter is to speed up the access control management efficiently. In DansGuardian, the client requests URLs, DansGuardian collects them and compare against the blacklist and whitelist. In case the request is clean, DansGuardian passes along the URL request. If the URL is not clean, DansGuardian blocks it [28].

· E-mail spam filter: It monitors, prevents, and blocks spam emails and phishing emails using a blacklist of spam emails resource. It prevents them from reaching the client side. There are many blacklists of emails’ anti spams, e.g., GFI MailEssentials’s list, ATL Abuse Block List, Blacklist Master, Composite Blocking List (CBL), and SpamCop.

· Many web-browsers and companies use their own blacklist against spams and phishing, e.g., IE, Google, and Norton.

3.2 Current Browser’s Phishing Protection

Most popular browsers provide a phishing filter that warns users from malicious websites including phishing websites. Filters mainly depend on certain lists to detect the malicious websites. IE7 used “Phishing Filter” that has been improved to be SmartScreen Filter in later version of IE due to the weak protection phishing filter provides [15]. In IE 8 and IE 9 "SmartScreen Filter" verifies the visited websites based on the updated list of malicious websites that Microsoft created and updated continuously [11] [12]. Similar to IE, Safari browser has filters checking the websites while the user browsing against a list of phishing sites. After the warning of PayPal to its members that Safari is not safe for their service [13], Safari started to use an extended validation certificates to support analyzing websites [14]. Earlier versions of Firefox take advantage of ant-phishing companies such as GeoTrust, or the Phish-Tank, using their list to support identifying malicious websites. The current version of Firefox has adopted Google's anti-phishing program to support its phishing protection.

Many research projects have proposed mechanisms that implemented as browser plugs-in or tool-bar against phishing attack. The main problem with plugs-in and tool bar is the need for users’ cooperation. Users may not cooperate and install the tool. Some users occasionally prefer to turn their filter off to brows faster [16]. Plugs-in and tools bar in some devices may not be as effective as in desktop browser due to the limitation in the performance and the screen space as the case in smartphones.

3.3 Classification of Phishing Defense

The different phishing defense approaches can be further classified based on where the alerts are generated:

• Browsers themselves: IE9, Firefox 5.

• Browsers extensions or plug-ins: BogusBiter, PhishGuard.

• Anti-phishing Search Site: PhishLurk “my project”.

• Proxy server: Dansguardian [20].

• Anti-phishing Server: OpenDNS [19], GFI MailEssentials [21], and some browser extensions use server side partially such as Skins [18].

According to the official website [20], DansGuardian is an active web content filter that filters web sites based on a number of criteria including website URL, words and phrases included in the page, file type, mime type and more. DansGuardian is configured as a proxy server that control, filter, and monitor all content. Therefore it functions more than anti-phishing. There is no such a project using proxy server as anti-phishing but it can be really an effective technique to classify and prevent phishing websites.