NLP AND COMPUTER VISION FOR DETECTING ILLICIT ACTIVITIES ACROSS TWITTER AND EXTERNAL LINKS
Abstract
Human trafficking is an issue all over the world and dehumanizes millions of people. Right now, trade networks spread this crime on the web with coded messages to promote such illegal businesses. Thus, since there are already limited resources in the law enforcement system, it becomes paramount to automatically detect messages that may be related to the crime, and which might also lead to further investigations. With the aid of natural language processing, this work groups tweets that could promote these illegal services and exploit minors. Images and URLs contained in such suspicious messages are further processed and are sorted out according to gender and age group, which can detect photographs taken of persons under 14 years of age. The first step involves mining tweets in real time containing hashtags related to minors. The key step is to preprocess the tweets so as to remove background noise and misspelled words after which the tweets can be classified to suspicious or non-suspicious. Face and torso geometrical features are then selected using the Haar model. The use of SVM and CNN allows for such identification concerning the torso and its proportionality with relation to the head, even in cases where the face details are undetectable. When torso features only are used, the SVM method performs better than CNN.