Chapter 5: Realizing a Metadata Framework 139 a document and used as document descriptors. Again, the search query consists of keywords. These systems work best when the key- words used are highly selective, that is, they occur infrequently in the documents in general, but are common in the documents that the searcher wishes to retrieve. In order to be effective, keyword-based searches require enough relevant keywords in each document descriptor. Another simple search is based on creation and modifi cation time. All fi le systems keep track of time when a fi le is created or modifi ed. So it is possible, for example, to search for fi les that were created today or yesterday. However, the usefulness of timestamps is directly com- parable to how well clocks are kept synchronized (as discussed above). An extension to keyword searching is concept searching. A thesau- rus is used to augment the original query with more potential key- words. Also grammatical and morphological knowledge may be applied in the search, such as plurals for instance. Additional extensions include Boolean searching (using logical operators AND, OR, and NOT); and weighted searching (providing weights to each keyword). Full-text searching differs from the keyword-based searches in that instead of indexing only a set of pre-defi ned keywords, the whole text in the document is indexed. This implies that the searcher does not have to worry about whether a certain keyword is indexed or not. Also, the full-text indexing indexes numbers, dates, etc. Relevance feedback uses documents themselves, instead of key- words, as query terms. Hence, relevance feedback may be referred to as query by example. Once a relevant document is found, the user may ask the search engine to search for more documents like it. In this search, the search engine may use any search technique described above. This is the greatest disadvantage of this search tech- nique – the searcher may not know which characteristics of the current document are used in the subsequent searches. This technique is best when the user is unable to fi nd more relevant documents (Raymond 1998). The discussion above concerns only textual information. How, then, should one search for content items that are stored in binary formats, such as music and photos? The content itself should in one way or another be used in searching. Searching for features in multimedia fi les is referred to as content- based retrieval (CBR) (Hirata and Kato 1992; Tseng 1999). In CBR, the idea is to analyse the content, for instance identifying shapes in a photo or rhythm in a song. These features can then be searched for, either by providing examples of the patterns or features that are
Personal Content Experience Page 162 Page 164