studies
Visualizing the Search Tail
A talk for initiating discussions on which direction Search in general should be headed.
Abstract
The internet as it stands today is a chaotic universe of scattered documents and links. The most useful resources are locked away somewhere out there, buried behind heaps of unwanted junk. There is no discernible organization of the data and no signs of the situation improving in the future.
Real breakthrough could only be made when search engines take the challenge to the next level by organizing the web as data cubes with multiple dimensions. In this sense, there is no single “tail”. Rather there are limitless dimensions through which users can explore the data from…
Cantonese Spell Checker with Metaphones
Abstract:
Given this complicated cultural and historical context, the “quality” of Chinese queries in Hong Kong ranged from murky to just awful. If we sample some top queries from Hong Kong’s web search, it is common case to find myriads of lazy and poorly formed Chinese queries with the occasional English characters thrown in, and structured in no consistent or discernible patterns that anyone apart from the user himself would find it impossible to decipher.
Hot Keywords Suite (co-author)
Abstract:
Hot keywords originating from names of newly emerged celebrities, products, breaking news, buzz words and such, appear every day in our lives on and offline. The detection of Hot Keywords and ability to discover their underlying relationships is an effective means for applications to serve more appealing and time centric stories to their users.
In this paper, we defined a way to model hot keywords detection from user generated content using a variation of approaches related to the Mining Association Rule problem – a textbook data mining scenario with a rich background of research work and optimization techniques already in place.
A prototype implementation of the algorithm was built and tested capable in processing mining hot keywords and topics from the search query logs of Yahoo Hong Kong’s Forum Search and News Search services within seconds. With an accumulative data sample of over 6 months long, the mining results demonstrated the powerfulness and effectiveness of the idea, revealing many newly coined keywords previously unheard.
Authors:
- Eric Ka Ka Ng, Kenny Siu Ming Tang, Wah Kwok, Winnie Wen Yee Chan, Alex Long Xiao Wang, Clement Lee Wing Wah
