Picking Cotton and Picking Presidents

15 11 2008

Remarkable data presented by strangemaps.com shows an interesting correlation between US cotton production areas in 1860 and areas that favored Barack Obama in the 2008 presidential elections. Check out the maps yourself. To quote Jeff Shrager:

Correlation does not prove causality, but they are highly correlated.

Impressions from CIKM08 and SSM08 in Napa Valley

2 11 2008

CIKM’08 and SSM’08 just took place this week in Napa Valley.  The conference was well attended, and the papers were exciting. On Sunday, I took part in Christos Faloutsos tutorial on Large Graph Mining (tutorial material), which gave a great overview of current research on networks and network algorithms. I had a chance to briefly talk to Christos about similarities of vector-based and graph-based research, and whether it would be worth looking into synergies (as they both can be understood as different perspectives on affiliation matrices). We agreed that it would be interesting to see whether there are certain problems in one domain (e.g. vector-space), that are easier to solve in the other (e.g. networks). Overall, I really enjoyed the tutorial.

The program (program.pdf) of the main conference was pretty packed. Among other sessions, the most interesting sessions I found were the ones on Web Search, Social Search, and Query Analysis. I found Bruce Croft’s keynote the most interesting, as it most closely related to the research of my group. He talked about the difficulty of IR to deal with “long queries” (also cf. intent descriptions of the TREC datasets), as IR in the past has often focused on short ones. One particularly interesting chart he presented illustrated that click-through ratios decline significantly with query length (using the MS click log). Bruce interpreted that as evidence for the difficulty of existing search algorithms to deal with such queries. Many contributions emphasized the importance of understanding search intent (something I am interesting in), and most contributions had a strong evaluation background. It seemed that Mechanical Turk created some buzz as a presumably easy and reliable (if you manage to avoid all the pitfalls such as spam, etc) form of large scale, fast evaluation for search (and other problems).

The social interactions at the conference were also great: I had the opportunity to share dinner with Greg (whose blog I read for a while now), and share another very pleasant dinner with Ron and Peter who work at Me.dium, a social search startup.

The SSM08 workshop was well attended, with 40 registered participants. I gave a presentation on “Purpose Tagging – Capturing User Intent to Assist Goal-Oriented Social Search” (paper.pdf) and I felt that people appreciated my ideas and research results. There were a couple of interesting questions at the end of my presentation (such as “are there dominant purposes for given resources”? Example: “eating food” would probably dominate restaurant websites. Maybe this question could be answered by looking into purpose frequency evaluations of resources) and some follow up discussions. Some of the people I have only known through their papers were in the audience, including M. Smith, A. Chowdury, E. Agichtein, M. Hurst, M. Hearst, E. Chi, L. Getoor, S. Dumais and many others. The workshop was a great chance for me to meet and discuss issues of social search with them – Thanks to Ian and Eugene for organizing such an excellent event. I would really like to see a successor to this event.

At the workshop, I was particularly excited to see a preview of MrTaggy (currently not public), a social search engine developed by Ed Chi at PARC. It allows people to purposefully re-arrange search results, and share them with friends. M. Hurst also talked about uRank, a Microsoft social search product (currently available only in the US). Tusavvy and me.dium.com are further contenders. The discussion was lively, and focused on privacy and different notions of search (search vs. browsing), social search (on/offline, synchronous/asynchronous, etc) and privacy. Abdur demoed Twitter search, a rather powerful tool and a rather unusual search problem.

Overall, the conference was an excellent event to get in touch with KM&IR researchers talking about their current research, as well as with people from industry (Yahoo, Google, Live Labs, Ask, etc). Next year’s conference is in Beijing.