Purpose – The purpose of this article is to present an aggregated methodology for construction of the stop word list in Farsi language and generate a generic Farsi stop word list. Design/methodology/approach – The stop word list is extracted based on: syntactic classes, domain dependent, corpus statistic and expert judgments. Some of the main challenges that arise in the Farsi automatic text processing are outlined as well. Findings – Results from the techniques are aggregated and a general Farsi stop word list containing 927 words is generated. Practical implications – The created stop word list can affect the efficiency and effectiveness of retrieval and indexing process in Farsi information retrieval system, moreover, it can play an important role during Farsi text segmentation. Originality/value – Our stop word extraction algorithm is a promising technique; it could be applied into other languages that they have ambiguities in automatic text segmentation.
کلید واژگان :Languages, Information retrieval
ارزش ریالی : 600000 ریال
با پرداخت الکترونیک