TY - GEN
T1 - Wide-AdGraph
T2 - 13th ACM Web Science Conference, WebSci 2021
AU - Kargaran, Amir Hossein
AU - Akhondzadeh, Mohammad Sadegh
AU - Heidarpour, Mohammad Reza
AU - Manshaei, Mohammad Hossein
AU - Salamatian, Kave
AU - Nejad Sattary, Masoud
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/21
Y1 - 2021/6/21
N2 - Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the update of these filter lists by combining different websites through a large-scale graph connecting all resource requests made over a large set of sites. The features of this graph are extracted and used to train a machine learning algorithm with the aim of detecting ads and tracking resources. As our approach combines different information sources, it is more robust toward evasion techniques that use obfuscation or changing the usage patterns. We evaluate our work over the Alexa top-10K websites and find its accuracy to be 96.1% biased and 90.9% unbiased with high precision and recall. It can also block new ads and tracking services, which would necessitate being blocked by further crowd-sourced existing filter lists. Moreover, the approach followed in this paper sheds light on the ecosystem of third-party tracking and advertising.
AB - Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the update of these filter lists by combining different websites through a large-scale graph connecting all resource requests made over a large set of sites. The features of this graph are extracted and used to train a machine learning algorithm with the aim of detecting ads and tracking resources. As our approach combines different information sources, it is more robust toward evasion techniques that use obfuscation or changing the usage patterns. We evaluate our work over the Alexa top-10K websites and find its accuracy to be 96.1% biased and 90.9% unbiased with high precision and recall. It can also block new ads and tracking services, which would necessitate being blocked by further crowd-sourced existing filter lists. Moreover, the approach followed in this paper sheds light on the ecosystem of third-party tracking and advertising.
KW - Tracking
KW - ad blocking
KW - crowdsource
KW - data privacy
KW - filter lists
UR - https://www.scopus.com/pages/publications/85109048076
UR - https://www.scopus.com/pages/publications/85109048076#tab=citedBy
U2 - 10.1145/3447535.3462549
DO - 10.1145/3447535.3462549
M3 - Conference contribution
AN - SCOPUS:85109048076
T3 - ACM International Conference Proceeding Series
SP - 253
EP - 261
BT - WebSci 2021 - Proceedings of the 13th ACM Web Science Conference
PB - Association for Computing Machinery
Y2 - 21 June 2021 through 25 June 2021
ER -