We will be using logistic regression to detect malicious URLs. Before we deal with the model, let's look at the dataset.
Logistic regression to detect malicious URLs
Dataset
We have the data in a comma-separated file. The first column is the URL and the second column identifies the label, stating whether the URL is good or bad. The dataset looks as follows:
url,label
diaryofagameaddict.com,bad
espdesign.com.au,bad
iamagameaddict.com,bad
kalantzis.net,bad
slightlyoffcenter.net,bad
toddscarwash.com,bad
tubemoviez.com,bad
ipl.hk,bad
crackspider.us/toolbar/install.php?pack=exe,bad
pos-kupang.com/,bad
rupor.info,bad
svision-online.de/mgfi/administrator/components/com_babackup/classes/fx29id1.txt,bad
officeon.ch.ma/office.js?google_ad_format...