Novosibirsk State University Journal of Information Technologies
Scientic Journal

ISSN 2410-0420 (Online), ISSN 1818-7900 (Print)

Switch to

All Issues >> Contents: Volume 11, Issue No 4 (2013)

Approach to forming thematic text collections on the basis of web-resources
Irina Ravilyevna Akhmadeyeva, Yury Alekseyevich Zagorulko, Natalya Vasilyevna Salomatina, Aleksei Sergeyevich Sery, Elena Anatolyevna Sidorova, Vladimir Konstantinovich Shestakov

Novosibirsk State University
A. P. Ershov Institute of Informatics Systems SB RAS
S. L. Sobolev Institute of Mathematics SB RAS

UDC code: 002.513.5:004.912

Problem of automatically forming text collections related to given themes on the basis of web-resources are considered. Approach to solution of this problem is suggested and system using metasearch technique and specialized facilities for operation with wiki-resources for collecting texts is developed. Experiments made with the system have proved productivity of the suggested approach.

Key Words
metasearch, web search query, wiki-resources, web-resources, text collections

How to cite:
Akhmadeyeva I. R., Zagorulko Y. A., Salomatina N. V., Sery A. S., Sidorova E. A., Shestakov V. K. Approach to forming thematic text collections on the basis of web-resources // Vestnik NSU Series: Information Technologies. - 2013. - Volume 11, Issue No 4. - P. 5-15. - ISSN 1818-7900. (in Russian).

Full Text in Russian

Available in PDF

1. Stepanov V. K. Primeneniye Interneta v professionalnoi informatcionnoi deyatel nosti. M.: FAIR, 2009. 301 c.
2. Meng W., Yu C., Liu K. L. Building Efficient and Effective Metasearch Engines // ACM Computing Surveys (CSUR). 2002. Vol. 34. No. 1. P. 48–89.
3. Arbatskaya O. A. Intellektualizatciya tematicheskogo poiska v poiskovykh sistemakh In ternet // Lingvisticheskoye obespecheniye informatcionnykh resursov bibliotek, muzeyev, arkhi vov i drugikh uchrezhdeny kultury. SPb.: Sudarynya, 2008. S. 173–190.
4. Voorhees E. M. Query Expansion Using Lexical-Semantic Relations // SIGIR'94. L.: Springer,
1994. P. 61–69.
5. Zhang J., Deng B., Li X. Concept Based Query Expansion Using WordNet // Proc. of the 2009 International e-Conference on Advanced Science and Technology / IEEE Computer Society. 2009. P. 52–55.
6. Nemrava J. Using WordNet Glosses to Refine Google Queries // Proc. of the Dateso 2006 Workshop. VSB – Technical University of Ostrava, Dept. of Computer Science, 2006. P. 85–94.
7. Berjon R., Faulkner S., Leithead T., Navara E. D., O'Connor E., Pfeiffer S., Hickson I. HTML5: A Vocabulary and Associated APIs for HTML and XHTML // W3C Candidate Recommendation.
8. Kuznetcov R. F. Izvlecheniye znachimoi informatcii iz web-stranitc s ispolzovaniyem predlozheny // RCDL’2006: Sb. tez. posternykh dokl. VIII Vseros. konf. SPb.: NU TcSI, 2006. 274 s.
9. Baumgartner R. Datalog-Related Aspects in Lixto Visual Developer // Datalog Reloaded. Lecture Notes in Computer Science. 2011. Vol. 6702. P. 145–160.
10. Ageyev M. S., Vershinnikov I. V., Dobrov B. V. Izvlecheniye znachimoi informatcii iz web-stranitc dlya zadach informatcionnogo poiska // Internet-matematika 2005. Avtomatiche skaya obrabotka veb-dannykh. M., 2005. S. 283–301.
11. Marathe M., Patil S. H., Garje G. V., Bewoor M. S. Extracting Content Blocks from Web Pages // International Journal of Recent Trends in Engineering, 2009. Vol. 2. No. 4. P. 62–64.
12. Stenback J., Le Hégaret P., Le Hors A. Document Object Model (DOM) Level 2 HTML Specification // W3C Recommendation. 2003.
13. Cui G. Y., Lu Q., Li W. J., Chen Y. R. Corpus Exploitation from Wikipedia for Ontology Construction // Proc. of the VI International Language Resources and Evaluation (LREC 2008). Marrakech, 2008. P. 2125–2132.
14. Leuf B., Cunningham W. The Wiki Way: Quick Collaboration on the Web. Addison-Wesley, 2001. 435 p.
15. Broder A., Glassman S., Manasse M., Zweig G. Syntactic Clustering of the Web // Computer Networks and ISDN Systems. 1997. Vol. 29. No. 8. P. 1157–1166.
16. Lindemann C., Littig L. Coarse-Grained Classification of Web Sites by Their Structural Properties // Proc. of the VIII Annual ACM International Workshop on Web Information and Data Management. 2006. P. 35–42.
17. Qi X., Davison B. D. Web Page Classification: Features and Algorithms // ACM Computing Surveys (CSUR). 2009. Vol. 41. No. 2. P. 1–31.

Publication information
Main title Vestnik NSU Series: Information Technologies, Volume 11, Issue No 4 (2013).
Parallel title: Novosibirsk State University Journal of Information Technologies Volume 11, Issue No 4 (2013).

Key title: Vestnik Novosibirskogo gosudarstvennogo universiteta. Seriâ: Informacionnye tehnologii
Abbreviated key title: Vestn. Novosib. Gos. Univ., Ser.: Inf. Tehnol.
Variant title: Vestnik NGU. Seriâ: Informacionnye tehnologii

Year of Publication: 2013
ISSN: 1818-7900 (Print), ISSN 2410-0420 (Online)
Publisher: Novosibirsk State University Press
DSpace handle

|Home Page| |All Issues| |Information for Authors| |Journal Boards| |Ethical principles| |Editorial Policy| |Contact Information| |Publication fee| |Open Access Policy| |Old Site in Russian|
© 2006-2018, Novosibirsk State University.