Liveblogging Wednesday @ Hypertext’09

1 07 2009

I’m sharing my live notes from the second hypertext keynote on Relating Content by Web Usage by Ricardo Baeza-Yates at Hypertext ‘09.

In case you have any additions, comments or links that would make my notes more complete / more useful, please leave a comment and fill in the blanks.

On the nature of search and intent:

Ricardo starts by stating that Search is not about document retrieval anymore. Given Ricardo’s history in document retrieval, this is an interesting thing to hear.

Search is rather about mediating user goals, in particular:

  1. idenitfying a users’ task
  2. providing means for task completion

For search to be successful, intent of searchers needs to be related to content available on the web. Ricardo argues that rather than focusing on content, search engines need to focus on objects, such as people, places, businesses, restaurants etc. Search intent then can be satisfied by exploiting and mapping characteristics of such objects and their corresponding attributes.

On the nature of content:

So how can we learn about objects and attributes? One approach is to look into metadata, where Ricardo distuingishes betweeen explicit (Metadata, Y! Answers, Flickr, etc) and implicit (anchor text, queries and clickthrough, etc) metadata. Ricardo points out that some of this metadata is private, making usage more complicated.

A key question in this context is “What is the quality of different kinds of metadata?”. Ricardo mentions that although user-generated metadata is noisy, on an aggregate level, he believes that it outperforms metadata generated by experts.

Search in Social Media:

Ricardo introduces TagExplorer, a Yahoo resesarch prototype for tag-based, faceted navigation/search of Flickr. Facets that are supported are locations, subjects, activites, time, names and others. I didn’t fully understand how these facets are identified or determined, but it seems the selection is based on / informed by previous empirical Yahoo research on different types of tags in Flickr.

Another prototype Ricardo demonstrates is the Correlator.

Web Usage:

Ricardo starts with the assumption that “when users use the web, they think”, and he suggests that we can/should tap into the outcome of these cognitive processes and exploit them for search. An example of that are query logs, where users actively make relevance judgements and engage in search query formulation / reformulation strategies.

Ricardo gives a number of examples where this might be useful, for example it might help in learning about relationships between queries, sessions and documents.

Open Issues:

Ricardo concludes his talk by discussing a number of issues he feels are important for future research. He discusses the interesting research question of studying explicit social networks (where links between users are made explicit) versus implicit social networks (where links between users are inferred). Related to this problem is the problem of implicit and explicit metadata. Ricardo refers to that problem as the virtuous cycle, where both implicit and explicit metadata can be used/should be used to inform search.

Another problem Ricardo mentions is the question when it is necessary to acquire more data vs. when we need to tweak our algorithms. As researchers, I guess we tend to have a bias towards working on the algorithmic rather than the data aspect.

My impressions:

I think Ricardo’s talk gave a great overview of the many activities at Yahoo Research. Due to the number of projects being presented, it was difficult for me to capture everything that was presented, and I feel that my notes in this post capture only a small part of what Ricardo talked about in his keynote. So check out Ricardo’s website / Yahoo research website / the slides of this talk to get a more complete picture of their exciting projects.

Update: I just stumbled upon Alvin Chin‘s notes of Ricardo’s keynote, which nicely complement my notes here.





Liveblogging Tuesday @ Hypertext’09

30 06 2009

I’m sharing my live notes from Lada Adamic‘s keynote on “The Social Hyperlink” at Hypertext ’09.

In case you have any additions, comments or links that would make my notes more complete / more useful, please leave a comment and fill in the blanks.

Lada starts by telling a story about the different social networks at MIT vs. Stanford, where at MIT fraternaties are well established and play an important role in defining social communities, while at Stanford they are discouraged – each year you have to enter a room lottery that determines with whom you gonna live with in the coming year. This difference can be observed in the social networks among students. But analyzing the relationships between people, and the actions they perform is challenging because of the difficulty of correlation vs. causation. Do two friends buy the same item because they have a social relationship (causation) or do they happen to buy the same item independent of their relation (correlation)?

The Social Hyperlink (how intent spreads through Second Life):

That’s why Lada got interested in Second Life, as in SL it is possible to trace how information (e.g. dance moves, items) spreads along social ties. In many cases, SL maintains information about previous item owners, allowing us to study how items propagate through networks of SL users. The example Lada talked about was gesture transfer among users of second life. Lada presented results from a study analyzing 12.6 mio transfers (where 23% have accurate previous owner info). What you can do with this data is investigate patterns of information spread through the social network.

Findings:

  • 48% of transfers happen between friends.
  • Cascades among friends are deeper / items are passed along social ties more often (higher precentage of non-leaf nodes)
  • But: adoption over time is weaker in social networks. Lada speculates that a reason for that is that information spread among friends is “niche” information (only relevant to a small group of homogeneous friends)

The next question Lada deals with is whether targeting hubs/early adopters would be a promising strategy to spread information in networks, by dividing the network into early adopters and laggards:

  • early adopters (or Mavens in Gladwell’s terms) were less social (fewer friends than the average)
  • they were also not active in distributing assests, that means that they are not influencers

Findings:

  • social networks influences adoption
  • niche items get a bigger boost (from social relations)
  • some individuals have more influence than others

User Intent and Social Networks: What I find interesting about this work, particularly the Second Life Case, is that it allows us to study the propagation of intent in social networks. This kind of data enables us to examine how social relations influence what people want. I find this to be an important research question, because intent is generally assumed to be an attribute of individuals rather than a characteristic of social networks as a whole. I think that people tend to prefer believing that their goals are individual and intrinsic, rather than determined(?) by their social network. Studies such as the SL study have the potential to explore this question empirically.

But network analysis can be employed for other aspects of links as well, Lada gives two more examples:

The Knowledge-Exchange Hyperlink:

One of the questions Lada talked about in this context was: What motivates users to answer questions?

From Interviews from Naver: altruism, learning, hobby, business, points

From crawls: filling in the blanks, correcting others

The Trust Hyperlink:

Lada got interested in Couchsurfing as a way to study trust in social networks. (The rationale being that trust is required to let somebody stay in your home.)

The study included 600.000 users, 156.000 surfed or hosted. 55.000 in largest, strongly connected component

Observerations: Overtime, people tend to engage in both surfing and hosting.

Results: direct reciprocity only accounts for 12-18% (surf the couch of the person you have hosted). Generalized reciprocity is at place.People are willing to vouch for people they only knew via couch-surfing. They tend to vouch for fewer couch-surfing friends than best friends, but overall there are  more couch-surfing friends.

My impressions:

I really enjoyed Lada’s keynote, I think the keynote did a great job in motivating and illustrating the potential of network analysis to explore different aspects of linked information on the web. I came across her work many times before in my own research and I’m happy to have had the chance to hear her talk in person.

Next up are my students Christian and Mark who are pitching their posters on “Understanding the Motivation behind Tagging” (Christian Körner) and “Towards Automatically Annotating Textual Resources with Human Intent” (Mark Kröll). Good luck!

Update (Jul 4 2009): Lada’s slides of the talk are available online!