Second Cycle, Degree Programme in Computer Science
Prof.: Paolo Ferragina
The student who successfully completes the course will have the ability to design a simple search engine or one of the numerous text mining tools which are at the core of modern Web applications.
Study, design and analysis of IR systems which are efficient and effective to process, mine, search, cluster and classify documents, coming from textual, html or XML data collections. In particular, we will:
- describe the main components of a modern search engine: Crawler, Parser, Compressor, Indexer, Query resolver, Results Ranker, Results Classifier/Clusterer;
- present and use in the Lab some interesting Open-Source Tools for IR applications, such as Lucene and Web graph;
- introduce some basic algorithmic techniques which are now ubiquitous in any IR application for data classification, compression, clustering, projection, and sketching.
For this course the prerequisite/s is/are
face to face
C.D. Manning, P. Raghavan, H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008
Chapter 2 “Text compression” of Managing Gigabytes, I.H. Witten and A. Moffat and T.C. Bell, Morgan Kauffman, Second edition, 1999.
The student will be assessed on his/her demonstrated ability to discuss the main course contents using the appropriate terminology.
Per informazioni scrivete a email@example.com.