Arabic named entity recognition using boosting method

چکیده :

Abstract: In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and Effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers. While most of these researches are based on Modern Standard Arabic (MSA), in this paper, we focus on Classical Arabic (CA) literature. We propose a corpus called NoorCorp with 200k labeled words for research purposes which is annotated by expert human resources manually. We also collected about 18k proper names from old Hadith books as gazetteer which is called NoorGazet. Using ensemble learning, we develop a new approach for extraction of named entities (NEs) including person, location and organization. Adaboost.M2 algorithm, as implementation of multiclass Boosting method, is applied to train the prediction model. Results show that performance of the method is better than decision tree as the base classifier. We have used tokenizing, part of speech (POS) tagging, and base phrase chunking (BPC) to overcome linguistic obstacles in Arabic. An overall F-measure value of 96.04 is obtained. In addition, we have studied the effect of preprocessing and external resources on the system results. Finally, the proposed approach is applied on ANERCorp as MSA corpus and we have compared the results with NoorCorp.

کلید واژگان :

Named entity recognition (NER), Ensemble learning, Boosting method, Classical Arabic Language

ارزش ریالی : 500000 ریال

دریافت مقاله
با پرداخت الکترونیک

Arabic named entity recognition using boosting method

ارزش ریالی : 500000 ریال

جزئیات مقاله

مقالات مشابه

بررسی عوامل موثر بر توسعه صنعت گردشگری شهرستان ساری با تکنیک sowt

بررسی تاثیر ابعاد برند بر قصد خرید مجدد مشتریان (مطالعه موردی: بازار بندر گناوه)

نقش ساماندهی در بهره وری نیروی انسانی آموزش و پرورش

تحلیلی بر نقش اسکان های غیر رسمی و پیامدهای آن در مناطق شهری

بررسی و تبیین بکارگیری اقتصاد مقاومتی در جمهوری اسلامی ایران