Novosibirsk State University Journal of Information Technologies
Scientic Journal

ISSN 2410-0420 (Online), ISSN 1818-7900 (Print)

Switch to
Russian

All Issues >> Contents: Volume 12, Issue No 4 (2014)

Tematichesky analiz zaprosov polzovatelei na osnove predmetno-oriyentirovannogo slovarya
E. A. Sidorova, S. V. Anokhin, I. S. Kononenko, N. V. Salomatina

Novosibirsk State University
Institute of Informatics Systems
Institute of Mathematics SB RAS

UDC code: 004.912

Abstract
The paper describes an approach to thematic categorization of short texts that is based on subject dictionaries of domain language vocabulary. The procedure of creating subject dictionary is proposed that involves the technique for learning the dictionary in the absence of adequate training sample of target texts. The proposed approach is illustrated by experiments aimed at assigning thematic categories to Internet user queries according to the subject headings of the rubricator of Internet activities. The categorization algorithm is presented and the results of experimental study discussed.

Key Words
relevance, dictionary learning, subject dictionary, short text categorization, thematic analysis, query analysis

How to cite:
Sidorova E. A., Anokhin S. V., Kononenko I. S., Salomatina N. V. Tematichesky analiz zaprosov polzovatelei na osnove predmetno-oriyentirovannogo slovarya // Vestnik NSU Series: Information Technologies. - 2014. - Volume 12, Issue No 4. - P. 83-95. - ISSN 1818-7900. (in Russian).

Full Text in Russian

Available in PDF

References
1. Segalovich I. Kak rabotayut poiskovyye sistemy. URL: http://download. yandex. ru/ company/iworld–3.pdf (data obrashcheniya 30.10.2014).
2. Romanov A. S., Meshcheryakov R. V. Opredeleniye pola avtora korotkogo elektronnogo soobshcheniya // Kompyuternaya lingvistika i intellektualnyye tekhnologii: Po materialam ezhegodnoi Mezhdunar. konf. «Dialog-2011». M.: Izd-vo RGGU, 2011. Vyp. 10 (17). S. 620–626. URL: http://www.dialog-21.ru/digests/dialog2011/materials/ru/pdf/55.pdf.
3. Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques // EMNLP. 2002. S. 79–86. URL: /http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf.
4. Sebastiani F. Machine learning in automated text categorization // ACM Computing Surveys, 2002. Vol. 34, Is. 1. P. 1– 47. URL: http://nmis.isti.cnr.it/sebastiani/Publications/ACMCS02.pdf.
5. Belov A. A., Volovich M. M. Avtomaticheskoye raspoznavaniye tematiki sverkhkorotkikh tekstov // Kompyuternaya lingvistika i intellektualnyye tekhnologii: Po materialam ezhegodnoi Mezhdunar. konf. «Dialog-2007» / Pod red. L. L. Iomdina, N. I. Laufer, A. S. Narinyani, V. P. Selegeya. M.: RGGU, 2007. S. 35–38. URL: http://www.dialog-21.ru/digests/ dialog2007/materials/html/05.htm.
6. Cohen W. W., Singer Y. Context-Sensitive Learning Methods for Text Categorization // Proceedings of SIGIR–96, 19th ACM International Conference on Research and Development in Information Retrieval / ACM Press, N. Y., US, 1996. Z. 307–315. URL: http://www.magicbroom.info/ Papers/CohenSi99.pdf.
7. Riboni D. Feature Selection for Web Page Classification // EURASIA-ICT 2002, Proc. of Workshopyu P. 473–478. URL: http://homes. di. unimi.it/riboni/eurasia02.pdf.
8. Dobrov B. V., Lukashevich N. V. Avtomaticheskaya rubrikatciya polnotekstovykh dokumentov po klassifikatoram slozhnoi struktury // VIII Natc. konf. po iskusstvennomu intellektu KII-2002. M.: Fizmatlit, 2002. T. 1. S. 178–186. URL: http://www. cir. ru/docs/ips/publications/2002_cai_rubr.pdf/.
9. Romanov A. S., Meshcheryakov R. V. Identifikatciya avtorstva korotkikh tekstov metodami mashinnogo obucheniya // Kompyuternaya lingvistika i intellektualnyye tekhnologii: Po materialam ezhegodnoi Mezhdunar. konf. «Dialog». M.: Izd-vo RGGU, 2010, Vyp. 9 (16). S. 407–413. URL: http://www. dialog–21.ru/digests/dialog2010/materials/html/62.htm.
10. Sidorova E. A. Podkhod k postroyeniyu predmetnykh slovarei po korpusu tekstov // Tr. Mezhdunar. konf. «Korpusnaya lingvistika – 2008». SPb.: SPbGU, Fakultet filologii i iskusstv, 2008. S. 365–372. URL: http://corpora.phil.spbu.ru/Works2008/Sidorova_365_372.pdf.
11. Sokirko A. V. Morfologicheskiye moduli na saite www.aot.ru. URL: http://www.aot.ru/docs/sokirko/Dialog2004.htm (data obrashcheniya 30.10.2014).
12. Antonova A. Yu., Klyshinsky E. S., Yagunova E. V. Opredeleniye stilevykh i zhanrovykh kharakteristik kollektcy tekstov na osnove chasterechnoi sochetayemosti // Tr. Mezhdunar. konf. «Korpusnaya lingvistika – 2011». SPb.: SPbGU, Filologichesky fakultet, 2011. S. 80–85. URL: http://corpora.phil.spbu.ru/Works2011/Antonova_80/pdf.

Publication information
Main title Vestnik NSU Series: Information Technologies, Volume 12, Issue No 4 (2014).
Parallel title: Novosibirsk State University Journal of Information Technologies Volume 12, Issue No 4 (2014).

Key title: Vestnik Novosibirskogo gosudarstvennogo universiteta. Seriâ: Informacionnye tehnologii
Abbreviated key title: Vestn. Novosib. Gos. Univ., Ser.: Inf. Tehnol.
Variant title: Vestnik NGU. Seriâ: Informacionnye tehnologii

Year of Publication: 2014
ISSN: 1818-7900 (Print), ISSN 2410-0420 (Online)
Publisher: Novosibirsk State University Press
DSpace handle


|Home Page| |All Issues| |Information for Authors| |Journal Boards| |Ethical principles| |Editorial Policy| |Contact Information| |Old Site in Russian|

inftech@vestnik.nsu.ru
© 2006-2017, Novosibirsk State University.