TY - GEN
T1 - Thesaurus-based named entity recognition system for detecting spatio-temporal crime events in Spanish language from Twitter
AU - Sotomayor, Marco
AU - Veloz, Freddy
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/1/4
Y1 - 2018/1/4
N2 - Social networks offer an invaluable amount of data from which useful information can be obtained on the major issues in society, among which crime stands out. Research about information extraction of criminal events in Social Networks has been done primarily in English language, while in Spanish, the problem has not been addressed. This paper propose a system for extracting spatio-temporally tagged tweets about crime events in Spanish language. In order to do so, it uses a thesaurus of criminality terms and a NER (named entity recognition) system to process the tweets and extract the relevant information. The NER system is based on the implementation OSU Twitter NLP Tools, which has been enhanced for Spanish language. Our results indicate an improved performance in relation to the most relevant tools such as Standford NER and OSU Twitter NLP Tools, achieving 80.95% precision, 59.65% recall and 68.69% F-measure. The end result shows the crime information broken down by place, date and crime committed through a webservice.
AB - Social networks offer an invaluable amount of data from which useful information can be obtained on the major issues in society, among which crime stands out. Research about information extraction of criminal events in Social Networks has been done primarily in English language, while in Spanish, the problem has not been addressed. This paper propose a system for extracting spatio-temporally tagged tweets about crime events in Spanish language. In order to do so, it uses a thesaurus of criminality terms and a NER (named entity recognition) system to process the tweets and extract the relevant information. The NER system is based on the implementation OSU Twitter NLP Tools, which has been enhanced for Spanish language. Our results indicate an improved performance in relation to the most relevant tools such as Standford NER and OSU Twitter NLP Tools, achieving 80.95% precision, 59.65% recall and 68.69% F-measure. The end result shows the crime information broken down by place, date and crime committed through a webservice.
KW - crime
KW - data extraction
KW - NER
KW - Twitter
UR - https://www.scopus.com/pages/publications/85045771655
U2 - 10.1109/ETCM.2017.8247537
DO - 10.1109/ETCM.2017.8247537
M3 - Contribución a la conferencia
AN - SCOPUS:85045771655
T3 - 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017
SP - 1
EP - 5
BT - 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE Ecuador Technical Chapters Meeting, ETCM 2017
Y2 - 16 October 2017 through 20 October 2017
ER -