Sailing the semantic seas

To kick off with a banal observation: for anyone attempting to provide the world with a means for finding relevant information from the immense vault of the Web, a core technical challenge is the problem of navigation.

To start with the familiar: using query terms is ok, it often does its job, but it does require a hefty amount of prior knowledge from the user. Often the users are really searching around in conceptual space, but instead of the real thing they have to manipulate the words the concepts are mapped to, with the further qualification that this happens only in the natural languages they are fluent in.

As a naval chart, query terms take you very fast to a particular archipelago in the semantic sea, or rather to a set of remote archipelagoes simultaneously, if you are unable to disambiguate perfectly when selecting your query terms. These archipelagoes might be vast or tiny, but to zoom in or out and move about one has to come up with more query terms, more prior knowledge. And here we run into a problem of presentation (and in that vein, I will now drop my attempts at halting similes).

Sometimes you might actually be looking for a quote, but more often your information need is originally formulated in the realm of semantics, not in the realm of syntax and morphology. And this raises an important point in our seemingly (at least culturally) ever increasing globalisation. Semantically similar information is spread across a multitude of languages, e.g. news reports of the same incident, consumer opinions about products, etc. Very few people can define queries to cover more than a portion of that spectrum, unaided.

You are probably now thinking of query expansion, modelling co-occurrence of words, cross-lingual information retrieval, machine translation, and the like. Yes, they provide one way to navigate. If you know where you want to go and how to map it to at least one language accurately. But, assuming you really wish to find your way in the semantic space, you have no smooth means of doing exploration, zooming or performing relative movement in controlled fashion. Nor do the currently existing methods/interfaces really offer good means of semantic cross-lingual comparison.

An alternative way, and one that M-Brain is exploring currently, is to operate directly in the concept space. The query is now a distribution over the conceptual space, whether it be formulated directly in it; or indirectly, via documents matching a query term pattern, or via example text whose semantic mapping one wishes to steer towards or avoid. Of course the representation is still by means of language (i.e. the names of the concepts), but at a meta level. The name of the concept need not occur at all in a text whose semantics match the concept. Nor should the text and the concept name need to be in the same language.

It is a tall order, we know that at M-Brain, having years of academic machine learning experience under our belts. Yet it is a goal that if reached at an adequate level multilingually, would provide a very useful exploratory navigational tool. We are not reinventing the wheel, this is certainly an area that has been the target of research for some time now, and continues to be a research area of keen interest. There are also numerous business intelligence applications and interesting underinvestigated minor research topics, but more on that later.

Meanwhile, we have now put forth a sketch of what a multilingual semantic naval chart might look like in our Issue of the Month section. Stay tuned and feel free to suggest other kinds of peeks you would like to see.

 

 

Info

02.04.2009 - 09:06
1530 Lukukertaa, 0 Kommenttia
 
Pidän

Kommentit

Media

Lisää kommentti

Tämän kentän sisältöä ei näytetä julkisesti.
  • Sallitut HTML-tagit: <em> <strong> <cite> <code> <br> <ul> <ol> <li>
  • Www-osoitteet ja email-osoitteet muutetaan automaattisesti linkeiksi.
  • Rivit ja kappaleet päätetään automaattisesti.
Ilmoitukset lähetetään annettuun sähköpostiosoitteeseen