A lexical approach for classifying malicious URLs

Michael Darling, Greg Heileman, Gilad Gressel, Aravind Ashok, Prabaharan Poornachandran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

48 Scopus citations

Abstract

Given the continuous growth of malicious activities on the internet, there is a need for intelligent systems to identify malicious web pages. It has been shown that URL analysis is an effective tool for detecting phishing, malware, and other attacks. Previous studies have performed URL classification using a combination of lexical features, network traffic, hosting information, and other strategies. These approaches require time-intensive lookups which introduce significant delay in real-time systems. In this paper, we describe a lightweight approach for classifying malicious web pages using URL lexical analysis alone. Our goal is to explore the upper-bound of the classification accuracy of a purely lexical approach. We also aim to develop a scalable approach which could be used in a real-time system. We develop a classification system based on lexical analysis of URLs. It correctly classifies URLs of malicious web pages with 99.1% accuracy, a 0.4% false positive rate, an F1-Score of 98.7, and 0.62 milliseconds on average. Our method also outperforms similar approaches when classifying out-of-sample data.

Original languageEnglish (US)
Title of host publicationProceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015
EditorsWaleed W. Smari, Vesna Zeljkovic
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages195-202
Number of pages8
ISBN (Electronic)9781467378123
DOIs
StatePublished - Sep 2 2015
Externally publishedYes
Event13th International Conference on High Performance Computing and Simulation, HPCS 2015 - Amsterdam, Netherlands
Duration: Jul 20 2015Jul 24 2015

Publication series

NameProceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015

Conference

Conference13th International Conference on High Performance Computing and Simulation, HPCS 2015
Country/TerritoryNetherlands
CityAmsterdam
Period7/20/157/24/15

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'A lexical approach for classifying malicious URLs'. Together they form a unique fingerprint.

Cite this