Using Machine Learning Techniques to Detect Malicious URLs

Posted: August 26th, 2021

Student’s Name

Professor’s Name

Course

Date

Using Machine Learning Techniques to Detect Malicious URLs

Abstract

The use of malicious URLs to exploit websites for execution of criminal activities is agrowing cybersecurity issue in the contemporary cyber-space. The aim of malicious URL developers is to lure unsuspecting victims into scams or fraudulent activities. In most cases, the motive is for financial gain. Over the past years, these criminal activities have led to loss of billions of money or destruction of devices. Most often, internet users have employed traditional techniques to detect malicious URL. However, due to the new technological advancements, the conventional methods of detection have been rendered ineffective increasing their susceptibility to online scam. As a result, telecommunication engineers and website developers have continued to invest research and development for advanced techniques that can overcome the vulnerability of the traditional mechanisms. Notably, the use of anti-malicious software to caution online presence against exploitative and malicious URLs has proved ineffective against malicious injections. This is attributed to increasing complexity of cyberattacks and cybercrimes.However, the adoption of different and advanced machine learning techniques that are strategically dynamic is increasingly proving effective to counter online criminal endeavors executed through malicious URLs. The study reviewed several studies on machine learning techniques on detecting and identifying malicious URLs and established most studies have shown that the techniques have proved effective.

Table of Contents

Abstract 1

Introduction. 3

Literature Review.. 5

Definition of URL. 5

How URLs are used to Attack Users. 5

Use of Backlist and Heurist and Their Corresponding Challenges. 6

Backlist Strategy. 6

Heuristic Classification. 6

Corresponding Challenges. 7

Machine Learning Application. 7

Stages of Detecting Malicious URLs using ML.. 8

Use of Artificial Intelligence in ML.. 8

Domain Name Systems (DNS) and WHOIS. 9

Novel Classification Framework. 10

UCI Dataset Phishing Website. 10

Other ML Techniques. 11

Conclusion. 13

Literature Gap. 14

Top-Level Domain (TLD) 15

Internet Protocol 16

Alexa Rank. 16

Punycode. 16

Letters and Numbers. 17

The Technical Design. 17

Part 1. 18

Figure 1: Working of Whitelist System (part 1) 18

Part 2. 18

Figure 2: Working of Blacklist System.. 19

Part 3. 19

Figure 3: Working of Multi-Check System (Part 3) 19

Works Cited. 21

Introduction

URLs (Uniform Resource Locators) are primarily used to access websites. However, there exist malicious URLs that usually host unwanted content with the potential of harming host systems. These URLs are mostly used by my cyber-attackers to infiltrate host systems. Attackers send such URLs or web pages to unsuspecting users for purposes of exploiting host systems besides acting as a gateway for unsolicited activities such as ransomware installation on the user devices. Also, these sites serve as the basis for facilitating online criminal activities. Occasionally, criminal perpetrators seeking to access confidential data illegally utilize malicious URLs by sending them as valid URLs to target users. Some of the common malicious strategies include spam URLs, Adware, phishing URLs, and JavaScript malware scripts, among others. Besides, the danger caused by malicious URLs varies and occasion disastrous. For instance, most malicious URLs have the potential to compromise confidential and sensitive information of both the governments and private entities. In this case, they end up jeopardizing the activities of security agencies. Sometimes, the attacks are manifested as rogue sites that are utilized by online criminals also to comprise rogue sites sell counterfeits and abet financial crimes such as money laundering and blackmailing for financial gains. Thus, given that online business presence is among the contemporary issues makes malicious attacks a critical aspect of concern as the World Wide Web is increasingly becoming an essential tool in executing business activities.

Currently, there are still challenges in detecting malicious sites because of the dynamic nature of cybercrimes. Moreover, the majority of the users are not exposed to the various techniques attackers employ to exploit them. According to Desai et al. (1432) survey, there has been an increase in online related attacks since 2012. For instance, there are about 33,000 phishing attacks that were executed in 2012 alone with an attributed financial cost of U.S. $ 687 Million. Equally, Google, Inc. reports on malicious attacks show that thousands of malicious websites are noted daily. The malicious sites are often embedded within legitimate sites, while some are disguised as genuine ones by cloning reputable websites to advance their activities. Further, attackers employ different attack strategies such as a distributed denial of service (DDoS), Man-in-the-Middle, phishing, and drive-by download, among others. In addition to these, they do employ SQL injection, watering hole, explicit hacking attempts, and social engineering approach for implementing their malicious attacks. As such, the growing online threat has catapulted for the implementation of internet security measures and counter-strategies. Most traditional policies have failed to overcome exposure to malicious attacks. In this case, Machine Learning strategies comes handy in combating online criminal endeavors by identifying threats of attacks engineered through malicious URLs. In this study, the aim is to conduct a literature review of various studies that have assessed multiple machine learning techniques to help detect and identify malicious URLs. The ultimate objective is to ascertain existing research gaps in the area that will require further research or additional information.

Literature Review

Definition of URL

URLs are used on the Internet as addresses to locate resources. According to Sahoo et al., Uniform Resource Locator, or a web address, comprises a protocol, hostname (involving the primary and top-level domains), and the path to locate the resource (2). A hostname may also be in the form of an I.P. address (Ferreira 115). Various types of protocols can be used depending on the level of security, the technology in use, and function. Examples of the contracts that are used include Hypertext Transfer Protocol Secure (HTTPS), Hypertext Transfer Protocol (HTTP), and Internet Message Access Protocol (IMAP) (Ferreira 115). Additionally, URLs are used to facilitate theft. These URLs launch various attacks that lead to loss of money and private information. Thus, URLs are addresses that are used to locate resources, and that can be used for malice.

How URLs are used to Attack Users

URLs can be used for various attacks, such as phishing, drive-by download, and social engineering. The former is mainly used to deceive naïve users into giving out their personal or other private information to malicious persons. The latter is obtained when a person clicks on a link, accesses a page that is similar to the original site’s page and inserts login information (Ferreira 116). Similarly, a person can be misled to share personal information on such a page. Ferreira states that this type of phishing occurs when attackers buy domains that resemble the original ones (116). For example, twitter.com may be transformed into twitters.com. In the case ofa drive-by download, users download malicious code that is embedded in plugins when they visit a malicious URL (Ferreira 116; Sahoo et al. 2). For social engineering attack techniques, methods such as honey traps, bating, and scareware are used (Ferreira 117). Therefore, systems that can detect these URLs can help prevent a significant number of cyber-attacks. Hence, preventing users from the pain of losing money or data.

Use of Backlist and Heurist and Their Corresponding Challenges

Backlist Strategy

Backlists can be used to prevent URL related attacks. In this case, it involves databases that store URLs that had been identified as malicious (Sahoo et al. 3). Afterward, the established databases are used to detect these URLs in the World Wide Web through performing a database lookup (Naveen et al. 389). Some scholars argue that backlists are effective. However, this argument is disputed by some who claim that that they have since become ineffective. For example, Sahoo et al. state that the backlists have a simple query overhead that makes the database lookup fast (3). In contrast, Naveen et al. argue that they are time-consuming because of computational intensity (389). Other scholars add that the daily creation of URLs makes it impossible for the backlists to contain an exhaustive list of malicious URLs (Ferreira 117; Sahoo et al. 3).Consequently, some attackers have developed tactics to fool the backlists. For instance, through obfuscation, which occurs when a user is obfuscated with I.P.s, domains, hostnames, and spelling errors, it is easy to maneuver around the website and execute attacks (Sahoo et al. 3). The tactic is used to mask the URLs to hide their intent. Also, backlists have been rendered ineffective by shortening URLs (Sahoo et al. 3). Therefore, these limitations have made backlists inefficient, necessitating the development of other detection methods.

Heuristic Classification

Heuristic classification strategy is used to detect malicious URLs by matching them with the signatures of the addresses that have already been identified as malicious (Naveen et al. 389). Naveen et al. state that the method is more efficient than the use of backlists (389). It is useful is exploiting websites and identify malicious URLs, thus helping detect and prevent potential attacks.

Corresponding Challenges

Despite the increased efficiency, heuristic classification and backlists face similar challenges. For example, both methods cannot efficiently prevent attacks from new URLs (Naveen et al. 389). Naveen et al. further argue that at least 25% of the growth in annual URL-related attacks and threats result from new URLs (389). Therefore, heuristic classification cannot be used to prevent about 25% of the malicious attacks. Obfuscation can also be used to weaken the heuristic classification method (Sahoo et al. 3). Sahoo et al. add that launching many attacks deems heuristic classification inefficient by making the malicious URL’s signature undetectable (3). The outlined approach can prevent attacks from new URLs, and it is challenged by methods used to make attacks undetectable.

Machine Learning Application

The inefficiency of backlists and heuristic classification has prompted studies on the use of machine learning (ML) to prevent URL-related attacks. Ferreira argues that ML uses algorithms to receive the data from the web address and applies statistical analysis to predict outputs, which classify URLs depending on whether they are malicious (117). The advantage of the machine learning approach is that it can be generalized to the new URLs (Sahoo et al. 3). It occurs in three stages. The first one entails acquiring the training data comprising a collection of URLs, which may be malicious or benign (Sahoo et al. 3). In this case, the latter is not embedded in malicious code. Such data can also be obtained from the backlists (Ferreira 118). Therefore, the machine learning approach is preferred among the researchers compared to the other methods because it can detect new URLs based on the historical data. In this way, it is easy to overcome the challenges illustrated indifferent techniques.

Stages of Detecting Malicious URLs using ML

The two steps of the machine learning method entail the extraction of features and the creation of the prediction function. In the second stage, the non-volatile features of the URL that may be lexical, content, and host-based are extracted (Ferreira 118; Naveen et al. 390; Sahoo et al. 3). These features must describe the URL sufficiently and be interpretable by the machine learning models (Sahoo et al. 3). Afterward, the features are converted into the numerical vector format to be used by machine learning algorithms (Ferreira 119). In the last stage, the prediction model is built, which is also trained to classify the URLs using machine learning algorithms such as online learning and batch (Ferreira 120). Thus, the last two stages create a prediction function that detects and classifies the malicious URLs.

Machine learning application employs various methods to detect malicious URLs. These methods, as discussed in the following:

Use of Artificial Intelligence in ML

Artificial intelligence has been used by researchers to identifymalicious URLs. Sahoo et al. surveyed the detection of malicious URLs employing machine learning approaches (18). The authors applied various machine learning algorithms that can be utilized to detect threats from URLs. The survey found that many learning algorithms can be adapted to create a predictive model in a relatively direct manner (14).The algorithms that were reviewed in this study comprise Online Algorithms, Batch Learning Algorithms, and Representation Learning as the primary techniques of machine learning (Sahoo et al. 14).However, the authors further included other methods such as Unsupervised Learning, Similarity Learning, and String Pattern Matching. All these mechanisms are used to build live systems that can offer Malicious URL Detection as a service. According to the study, Sahoo et al. provide a systematic formulation of a machine learning technique for Malicious URL detection (28). Sahoo et al. further posit that the effectiveness of machine learning techniques in detecting URLs as either malicious or benign is fundamentally based on their ability to utilize a collection of URLs as their training data sets, and in line with the statistical characteristics learn a prediction algorithm (4). The authors claim that the presence of a training data set is a primary requirement to train a machine learning technique to identify malicious URLs. Therefore, the success of any machine learning approach is determined by the training data’s quality, which in turn relies on the superiority of feature representation.

Domain Name Systems (DNS) and WHOIS

In their study, Kuyama et al. extracted feature points from DNS and WHOIS and used them in machine learning to create a training framework (76). The model was then utilized to determine if the accessed domain is benign or malignant. It was found that thedetection of infections or attacks in terminals incommunication with the command and communication (C&C) server and LAN requires prior determination of a C&C server. As a result, the authors developed a technique to assist in determining new servers. The C&C servers were extracted by reviewing Emdivi, PoisonIvy, and PlugX, which qualify as primary malware for targeted malicious attacks. After removing the malware,they were then analyzed comprehensively using the LastLine-a Sandbox analyzer. Kuyama et al. then created a training framework with support vector machine (SVM) (which conducts classification of two categories by pattern recognition) alongside neural network (a variety of supervised learning mechanism) as a machine learning algorithm (79).Another training model was developed using a neural network with e-mail addresses together with WHOIS valid terms and N.S. records number as well as MX records number from DNS. The training models were then assessed using neural networks through cross-validation techniques (Kuyama et al. 80).The results show a neural network with a superior detection rate of 98.5%, while that of SVM being 97.8% (80). Overall, the proposed machine learning method of detecting Malicious URL achieved high accuracy value; hence, affirming the efficiency of machine learning in the detection of malicious attacks.

Novel Classification Framework

The use of conventional techniques cannot solve the ever-evolving contemporary URL problems and web-access methodologies. Moreover, these traditional methods fall short in identifying the current URLs, for instance, dark web URLs and short URLs. For this reason, Naveen et al. developed a novel classification framework to address challenges faced by conventional techniques in malicious URL detection. The proposed method is based on sophisticated machine learning mechanisms that address the syntactical nature of the URLs as well as the semantic and lexical nature of the dynamically altering URLs. Naveen et al. based their proposal on the two step-machine learning mechanisms. The mechanism entails obtaining appropriate feature representation capable of providing the determining insights in detecting malicious URLs. It also employs the feature representation to train a learning-based detection technique (391). The authors used the analogy of blood and the heart, where the blood represents featureswhile the heart denotes the machine learning technique. When URL features pass through the machine learning engine, classification is established based on prior learning. Naveen et al. applied the Lexical Analysis Features, which they used together with geo-ranking and 3rd party features (391). According to the authors, the proposed approach is usefulbecause it extracts lexical features for detecting malicious URLs. Thus, the offered machine learning method was able to overcome the challenges experienced by traditional methods of malicious URL detection.

UCI Dataset Phishing Website

An exhaustive collection of harmful content from the websites requirescontinual development of the sites’ URLs. Since this technique requires a lot of time, machine learning plays an essential role in ensuring the detection of malicious URLs. As such, Desai et al. adopted machine learning to train the tool and identify new content that crosses its features into different particular categories to facilitate corresponding action (1432). The authors employed the UCI Dataset Phishing Website to aid in training the classifier to ensure it can test the extracted content features whenever a user enters the URL (Desai et al. 1443).To determine the safety of the URL, the authors considered three algorithms: K-Nearest Neighbours (KNN), Support Vector Machines (SVM), and Random Forest. Google Chrome extension (coded using JavaScript, CSS, and HTML) was then used to examine whether a user has entered phishing or benign URL. Immediately the user enters the URL; it is detected using extension through GET mechanism and to the JavaScript of the extension. The authors demonstrated how Chrome extension could be utilized to detect phishing websites. The expansion, as discussed previously, is trained using machine learning. Hence, machine learning is an integral part of Malicious URL detection.

Application of machine learning techniques to identify malicious webpages was one of the pioneering research works in the field of machine learning (Kazemian and Ahmed 1167). The work overlooked the content of the webpage and scrutinized URLs based on a bag-of-words representing tokens with commentaries denoting the positions of the tokens within the particular URL. The authors postulate that the research indicated that lexical features could yield up to 95% accuracy of a webpage’s content features.

Other ML Techniques

Various machine learning techniques have also been developed and successfully applied to the detection of malicious URLs. These schemes can be categorized into two: online learning techniques and regular batch machine learning schemes. A majority of the existing systems used for identification of malicious URLs using batch machine learning procedures to learn a classifier from a particular training data set comprising of labeled instances. Then, the model is used to catalog a test instance (Zhao and Hoi 3). Batch machine learning techniques have also been successfully applied in the identification of URLs used for unleashing phishing attacks. Tayyab and Masood mainly argue that machine learning techniques designed to handle phishing typically rely on support vector machine (SVM). SVM is an approach that is appreciably utilized in solving arrangement issues (348). According to Wei et al., traditional machine learning techniques used for the detection of phishing first analyze the suspicious URL to perform feature selection (2). Next, the selected features, in association with their labels, are used to model phishing detection tools. However, deep learning is also applicable in phishing detection.

Similarly, Ferreira argues that the fundamental aim of machine learning is to create algorithms capable of receiving input data. In this case, statistical analysis is used to forecast an outcome while updating outputs in a stepwise fashion as new data is availed (117). Based on a set of URLs acting as the training data, machine learning techniques learn a prediction function capable of classifying URLs as benign or malicious. Here, they are enabled to generalize new URLs, thus giving them an advantage over backlisting approaches. According to Hu et al., the confidence-weighted classification of URLs based on their lexical features is one way of identifying phishing URLs (2). Another workable approach is to identify features from suspicious URLs and using them to develop a logistic regression classifier to differentiate between a phishing URL from a harmless one. According to various authors, the features used in this case include the occurrence of alarming critical words in the URL under inspection, Google’s quality guidelines for webpages, and features derived from Google’s page ranking (Hu et al. 2;Kazemian and Ahmed 1167). Moreover, Hu et al. posit that extracting certain features from the content of web pages and using the properties to train machine learning techniques can be used in the identification of malicious websites (2).Therefore, the authors demonstrate the use of machine learning to detect malicious software.

Akin to Hu et al. assertions (2), James et al. agree that confidence-weighted classification, as well as the creation of machine learning techniques based on lexical features of URLs,isa workable approach for detecting phishing URLs (305). Further, the authors state that the host-based, as well as batch-based algorithms, can help identify phishing URLs. They also posit that a combination of lexical features and host-based approaches produce the best results in identifying phishing URLs. Equally, the researchers suggest that online algorithms for detecting malicious URLs are more efficient than batch-based algorithms.The same argument is ascertained by Patil and Patil,who affirm that machine learning can be used to detect a variety of malicious URLs, including spams and malware (143). The authors additionally posit that machine learning can be employed to identify the type of attack the malicious URL is attempting to launch. Notably, the authors single out Support Vector Machine (SVM) as a vital algorithm for identifying harmful URLs and ML-kNN and RAkEL algorithms as ideal approaches for determining the attack type. Hence, the writers propose that a multi-class classification approach founded on discriminative features such as Webpage source, domain name, and short URLs can be used to identify malicious URLs and their intended attacks.

Conclusion

The paper focused on the review of literature on the use of machine learning techniques to detect malicious URLs. As shown in the studies, the malicious websites are those that are utilized by attackers for the purpose of stealing their information, scam or destroy their devices. As a result, many individuals, governments, and organizations have lost billions of money, identity or data. Initially, traditional mechanisms were used to detect malicious URLs or malware. However, with the contemporary technological advancements revolving in different fields such as e-banking, e-commerce, and other online activities, the conventional malicious website detection methods have become ineffective. In this case, current studies continue to develop state-of-the-art mechanisms that are capable of overcoming limitations of traditional methods. In this regard machine learning has become synonymous with the contemporary malicious URL detection. The review has discussed how machine learning algorithms can be employed to detect malicious websites. Clearly, machine learning techniques are efficient in identification of malicious URLs. The approaches can also be used to determine the type of attack a malicious URL is attempting to launch. All the existing machine learning approaches for detection of malicious URLs can be broadly categorized into two subgroups: regular batch and online learning schemes. Basing on the fundamental principles of machine learning, scholars are continuously suggesting workable approaches for identifying malicious URLs. An approach that efficiently detects malicious URLs is supposed to identify new addresses. An example of inefficient approaches in this regard is backlists and heuristic classifications because they are unable to detect new threats without manual updates. Technological advancements that increase the risk of monetary losses create a need for approaches that can detect malicious URLs. Therefore, the machine learning approach is being explored because it can be used to detect new URLs through generalization. Lastly, the drawbacks of using the machine learning technique to detect new URLs should be explored.

Literature Gap

The notable technology in the current global environment is the increasing use of mobile phones not only for communication but for all other services related to the Internet. Although the literature review has focused more on malicious attacks directed to personal computers and desktops, mobile phones present a critical challenge to counter the impacts of malicious URLs. The reason is that the webpages directed to mobiles are significantly different from those directed to desktops and PCs in terms of layout, content, and general functionality. Equally, most techniques employed in their desktop and PCs counterparts are unable to detect or handle mobile related malicious URLs effectively. As such, the research establishes that mobile-related malicious URLs are the new era of attacks, a gap that it seeks to address. The study aims to develop detection of the Malicious URLs on mobile applications, including all current solutions only available on the website but unable to detect malicious mobile URLs. This is in the realization that most attackers are currently targeting mobile phones, mainly because they are the majority users and consumers of Internet-related services in almost every part of the world. Hence, the designed application will be distributed to every mobile user visiting websites to direct their activities such that they avoid opening malicious URLs and protect their privacy and content.

Secondly, the study seeks to develop other techniques that will enhance security against malicious URLs visited on mobile phones. This is in addition to the ones suggested in the literature review. These additional techniques aim to avoid the time of running machine learning techniques for every URL. As such, the study will develop a single whitelist and Blacklist that will populate millions of URLs from where the user can check and detect suspicious websites. If the results are not satisfactory, the user proceeds to check the Blacklist or use other techniques and machine learning to dig further. The system will have more checks to sufficiently provide a 100% score to establish if the URL is malicious, 0%, or benign, 100%. Some of these techniques are discussed in the following parts;

Top-Level Domain (TLD)

It is the highest level in the hierarchical Domain Name System of the Internet. The TLD names are installed in the root of the namespace. For all domains in lower levels, it is the last part of the domain name, that is, the last label of a fully qualified domain name. For example, in the domain name www.example.com, the top-level domain is com. Responsibility for management of most top-level domains is delegated to specific organizations by the Internet Corporation for Assigned Names and Numbers (ICANN), which operates the Internet Assigned Numbers Authority (IANA). It is in charge of maintaining the DNS root zone. Hence, the system will award scores based on the popularity of the TLD. For instance, .com will have the highest score. However, if the domain ends with .xxx, it will get a lower score. For effective implementation, a list of all known TLDs will be availed in the system through which it can run the checklist once executed.

Internet Protocol

In this part, the security depends on the type of protocols. Therefore, the score will be higher when the URL starts with a secure protocol such as HTTPS. However, a lower rating will be applied to ones that begin with an insecure URL such as HTTP. Thus, for such URLs, there will be no high scores since most websites do not care about such insecure protocols.

Alexa Rank

The popular website, “Alexa,” will be used to help implement this type of check. The website can give the rank of the website based on the number of visitors. However, some URLs display big ranking numbers but secure, hence it will require that existence of such websites is established without taking considering the rank value. Besides, once the rank is high, it implies the site is secure. For instance, twitter.com has a rank of 43; however, searching the website as twitter.net yields a different rank, thus raising suspicions.

Punycode

It is a special encoding used to convert Unicode characters to ASCII, which is a smaller, restricted character set. Punycode is used to encode internationalized domain names (IDN). In this part, the attacker might use a URL that looks the same to a real URL to trick the victim. For example, www.goögle.com looks similar to www.google.com, but the first URL takes the user to another website, because it is not the real google.com but developed using a Punycode. Therefore, the tools will help detect the presence of such look-alike characters by converting them to Unicode. For example, the results of converting two URLs would be; www.goögle.com == www.xn--gogle-kua.com, www.google.com == www.google.com

Letters and Numbers

In this section, the study will create a small table to check the URLs that contain numbers that might be used to manipulate the user and convert it to a corresponding letter and check if the website already exists in the Whitelist or Blacklist. For example, the attacker might use URLs such as;

Letter O can be replaced with Number 0

Letter L can be replaced with Number 1

Letter Z can be replaced with Number 2

Letter S can be replaced with Char $

Hence, if the URLs are: www.micr0soft.com, this check will convert all the numbers to the letter and then check it. If it already exists, it means that it might be malicious.

Finally, the application users will be provided with the following information obtained from “Whois.” It is a query and response protocol that is widely applied for querying databases that store the registered users or assignees of an Internet resource such as a domain name, an IP address block, or an autonomous system. The query is also used for a broader range of other information. Other information that the application will provide includes the date of creation, geographic location, and owner’s name. Equally, users have the option to see URL webpage screenshots depending on server response time as well as real access URL is the one on display is shortened.

The Technical Design

In this section, a discussion on the functions of the project is made. The section also demonstrates how the project will be useful in the management of time taken to respond to an emergency. The subsequent parts show how the system will work once the user enters the URL. It is implemented in three parts. The first two-part would be enforced using the elastic search. Elastic search is a highly scalable open-source full-text search and analytics engine. It allows users to store, search, and analyze significant volumes of data quickly and in near real-time. It is generally used as the underlying engine or technology that powers applications that have sophisticated search features and requirements.

Part 1

In this first part, a check on the URL is executed through the Whitelist. In case there are no malicious activities found, the user is informed that the URL is 100% secure, hence will not have the necessity to go through all other checklists. The system implemented in this part provides additional optional information such as owner, screenshots of the website, date of creation, among others. Figure 1 below is a logical design of the interaction between the system and the user (mobile phone).

Part 2

In case it is established that the answer in Part 1 is no, implying there are malicious detections, part 2 is executed. In this part, the aim is to check the URL in the Blacklist. If the answer is positive (YES), the results are shown to the user, indicating that the URL is 100% Malicious; hence there will be no need to check through the whole system. The system implemented in this part also provides other optional information such as owner, screenshots of the website, date of creation, among other

Text Box: Figure 2: Working of Blacklist System

Part 3

Part 3 will be implemented if tests under Part 1 and Part 2 have failed. It involves sending the URL through the web service to a server that will run in multi-threading techniques to speed up all checklists and return a score for each part. The results are expected to show the user the estimated level of security within the link. Figure 3 below is an illustration of the working of a multi-check system (Part 3).

Once all the lists are checked, the system returns the value of each. The calculations are made to find the scores of each part about 100% state. Other information available in the URL that would be useful to enhance the safety decisions of the user includes the date of creation, geographical location, owner name, the real URL (especially if it was shortened), and the screenshots of the page, which is optional. Equally, other advance options can be utilized to facilitate system checks. In this case, such choices include increased ability to run more tests, which are relatively advanced depending on the time required to receive responses from the URL. Hence, in this case, the URL can be sent for the last check method to check its content and assess if it has java script content.

Works Cited

Desai, Anand, et al. “Malicious Web Content Detection Using Machine Learning.”2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE, 2017.

Ferreira, M. (2019). Malicious URL Detection Using Machine Learning Algorithms. In Digital Privacy and Security Conference, 2019, p. 114.

Friedrichs, Oliver, et al. “Method and Apparatus for Detecting Malicious Software Using Machine Learning Techniques.” U.S. Patent No. 8,875,286. 28 Oct. 2014.

Hu, Zhongyi, et al. “Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data.”2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2016.

James, Joby, L. Sandhya, and Ciza Thomas. “Detection of Phishing URLs Using Machine Learning Techniques.”2013 International Conference on Control Communication and Computing (ICCC). IEEE, 2013.

Jeena, R. et al. “Malicious URL Detection Using Machine Learning Techniques.”International Journal of Innovative Research in Science, Engineering and Technology, vol. 8,

Kazemian, Hassan B., and Shafi Ahmed. “Comparisons of Machine Learning Techniques for Detecting Malicious Webpages.”Expert Systems with Applications, vol. 42, no.3, 2015, pp. 1166-1177.

Kulkarni, Arun. “Phishing Websites Detection using Machine Learning.” (2019).

Kuyama, Masahiro, et al. “Method for Detecting a Malicious Domain by Using WHOIS and DNS Features.”The Third International Conference on Digital Security and Forensics (DigitalSec2016). Vol. 74. 2016.

Naveen, Immadisetti Naga Venkata Durga, K. Manamohana, and Rohit Verma. “Detection of

Malicious URLs Using Machine Learning Techniques.” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no.4S2, 2019, pp. 389-393. (https://www.ijitee.org/wp-content/uploads/papers/v8i4s2/D1S0085028419.pdf)

Patil, Dharmaraj R., and Jayantrao B. Patil. “Feature-Based Malicious URL and Attack Type Detection Using Multi-Class Classification.”ISeCure, vol. 10, no.2, pp. 2018.

Sahoo, Doyen, Chenghao Liu, and Steven CH Hoi. “Malicious URL Detection Using Machine Learning: A Survey.” arXiv preprint arXiv: 1701.07179, 2017.

Tayyab, Saad, and Asad Masood. “A Review: Phishing Detection Using URLs and Hyperlinks Information by Machine Learning Approach,”2019.

Vanhoenshoven, Frank, et al. “Detecting malicious URLs using machine learning techniques.”2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2016.

Wei, Bo, et al. “A Deep-Learning-Driven Light-Weight Phishing Detection Sensor.”Sensors, vol. 19, no.19, 2019, pp. 4258.

Zhao, Peilin, and Steven CH Hoi. “Cost-sensitive online active learning with application to malicious URL detection.”Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

Type of paper needed:

Pages:

You will get a personal manager and a discount.

Academic level:

We'll send you the first draft for approval by at

Total price:

$0.00