Mode 3 knowledge production: or the differences between a blog post and a scientific article

17 02 2014

With the proliferation of data, the increasing availability of rather simple tools to analyze data and an increasing number of people who can use these tools in combination with the availability of low cost publication platforms (e.g. blogs), the potential to democratize certain aspects of scientific processes – such as empirical data analysis – seems tremendeous. This might give rise to the idea that everyone who can use these tools (such as Python), and publish the results from their analysis (e.g. via blog posts) can now participate in knowledge production.

An opportunity for data analysis by the masses: If true, the potential of such a development would be enormous: By increasing the number of people that participate in scientific processes, we could increase the coverage of interesting phenomena to explore, research activity would not be constrained to areas that are funded by large institutional bodies, and in general more research could get done.

At the same time, this would represent an absolut shift in the way science has been operating up til now, as people formerly not part of traditional scientific processes (and not trained in scientific knowledge production) now move into new territory, and participate in new processes. In order to understand this shift, we need to understand the modi operandi of scientific knowledge production in the past.

Different modes of knowledge production: There are many ways to look at scientific knowledge production. A very influential distinction has been made by Gibbons et al, which argue that we have to differentiate between “Mode 1” and “Mode 2” of knowledge production.

M. Gibbons, C. Limoges, and H. Nowotny. The new production of knowledge: the dynamics of science and research in contemporary societies. Sage, 1997.

Mode 1 refers to traditional knowledge production processes, by focusing on hierarchical mechanisms and processes executed by a set of homogenous actors from a common disciplinary background. An example would be the ivory tower view of a university, where a scientist or group of scientists with homogeneous backgrounds work on disciplinary problems. This mode is increasingly being replaced by Mode 2 knowledge production, which is socially distributed, organizationally diverse, application-oriented, and trans-disciplinary [GLN97, NSG03]. An example would be a network of university partners with different disciplinary backgrounds collaborating on an application-oriented problem with other stakeholders from e.g. industry or other public institutions.

Mode 3 knowledge production: The proliferation of data, tools and people able to make use of them might give rise to what I might call Mode 3 knowledge production which could be self organized, context-focused, and driven by individuals not primarily trained in scientific processes. An example would be an interested user (or group of users) of a social network platform that looks at data that might explain some online social network phenomenon that they feel worth exploring. Another might be a group of patients performing self experiments or experiments with n=1 in order to explore the cause of personal symptoms or health concerns. These groups might embed the discussion of their findings into community conversations and social sensemaking processes.

While this idea looks appealing on the surface, there are a number of issues. For example: Mode 1 and mode 2 knowledge production differ in terms of organization, but both follow the scientific method in terms of basic mechanisms and values. It is yet unclear whether an emerging mode 3 would adhere to the scientific method as well. Being able to use analysis tools to look at data does not necessarily mean that whatever kind of analysis follows from that contributes to scientific processes in meaningful ways.

The scientific method: So what is the scientific method, i.e. what are some of the standards, ethics and practices that mode 1 and mode 2 knowledge production follow, which a potential mode 3 knowledge production would have to adopt as well? Answers can be found in the philosophy of science, which has long been thinking about the nature of science and scientific processes. This is an entire field that can not be adquately described here –  the Hempel–Oppenheim model would just be one of many examples.

However, typical qualities of scientific processes would include, but are not limited to: the ability to reproduce results including a proper description of methods and means of data collection, sharing of data, the quality of hypotheses (w.r.t. falsifiability, explanatory power, understandability, etc), the relation to state-of-the-art research including proper citations of existing literature, critical reflections about the validity of findings, as well as the quality of interpretations and whether they follow from the data.

Do blog posts follow the scientific method? While there is nothing that prevents research published via blog posts to follow the scientific method, more often than not blog posts – even data-oriented ones – fail to meet these most basic requirements. For example, from a data visualization published via a blog post it does not necessarily become clear where the data is from, how the data has been collected, which methods have been applied, whether the results are reproducable, whether the data used will be shared, how the analysis relates to the state-of-the-art of scientific knowledge or whether there is an agreement that the conclusions presented follow from the data.

This is not surprising. In scientific articles, peer-review is the most common (but certainly not infallible) instrument to check whether submitted research follows the scientific method. In blog posts and similar user-generated media, there are currently no established social or other mechanisms enforcing the scientific method, which often makes their results – while potentially interesting – less useful from a scientific perspective. In addition, it is typically impossible for a researcher to ignore a reviewer’s comment (as an editor will make a decision based on reviewers’ comments whether to publish an article or not), at the same time it is usually easy for a blogger to delete an unwanted comment.

Conclusion: Whether a third mode of knowledge production will ultimately emerge is unclear. While the democratization of data analysis will expand without a doubt, it will depend on the masses of amateurs and bloggers to adopt principles based on the scientific method or the masses of scientists to participate and enforce the scientific method in blog conversations or both. It will probably not depend on the technicalities of the publishing medium – whether blog posts or not.

References:

M. Gibbons, C. Limoges, and H. Nowotny. The new production of knowledge: the dynamics of science and research in contemporary societies. Sage, 1997.

H. Nowotny, P. Scott, and M. Gibbons. Introduction – mode 2’revisited: The new production of knowledge. Minerva, 41(3):179–194, 2003.





When is a student ready to finish his/her PhD?

29 05 2013

I’ve made it a hobby for myself to ask this question to professors that I meet at conferences in my field. The answers that I have collected in these conversations are manifestations of an astonishing variety of underlying research philosophies and ideologies. Here’s a list of answers I have received so far, the labels in brackets are mine, they might be misleading, deceptive or misrepresent the original intent of the answer given.

  • When he is offered a position in industry or academia that assumes a PhD (the american view)
  • When he has convinced his corresponding research (sub-)community that the work he has been doing is worthy of a PhD (the sociologist’s / psychologist’s view)
  • When he has expanded the state of knowledge by a significant amount / When he added new knowledge to the existing body of knowledge about the world (the epistemological view)
  • When he has built something truly new, interesting, elegant and/or complex (the engineer’s view)
  • When he has reached his personal intellectual maximum i.e. the maximum intellectual capacity that he is capable of acquiring (the subjective view)
  • When he is able to explain the results of his work in one sentence (the communication view)
  • When he has published n papers (the bureaucrat’s view)

I am amazed that there is little repetition in the answers that I get. What is your answer? Add it to the comments.





Programming Poems with Mechanical Turk

29 12 2010

Mechanical Turk has received some bad press recently (this is one example). It has been pointed out that Mechanical Turk can be used to do evil, which got me interested in seeing whether if and how it can do any good (or at least: creative). This has led to the post here, and resulted in the following poem – collaboratively produced by independent workers on Mechanical Turk.

In the daily life of a Mechanical Turk

In the daily life of a Mechanical Turk,
Never have I quite finished my work,

For I return and refresh and come back for more
In quest of a yet higher score

Now and then my eyes may tire
If I said they didn’t, I’d be a liar

Though I am spent, It’s hard to stop
Even when I’m ready to drop

My available HITs are waiting for me
Ocassionally I’d rather go and watch TV

Nevertheless, I need the cash
Keen to throw a birthday bash!

Ever so slowly my earnings increase
Yet my passion for Mechanical Turk would never cease …

The structure of the poem is fully algorithmically determined. It has been written collaboratively by a crowd of Mechanical Turkers interacting with each other only through HITs. Before designing the poem algorithm, I’ve done some research on the structure and different types of poems, which led me to Acrostics.

“An acrostic (Greek: ákros “top”; stíchos “verse”) is a poem or other form of writing in which the first letter, syllable or word of each line, paragraph or other recurring feature in the text spells out a word or a message.” (wikipedia)

In my poem algorithm, I’ve constrained the first letter of each sentence in the poem, thereby forming an acrostic. As an additional constraint, I required the poem to consist of pairs of sentences that rhyme (similar to a Limerick).

While I determined (i.e. programmed) the structure of the poem, the content was completely produced by mechanical turkers. The only input provided was the title, which acts as the first sentence of the poem as well. Each rhyming pair of sentences was written by 2 different turkers, i.e. the output of one turker was used as an input for another turker. Total price of the poem was 1.804 USD. The poem was built incrementally, each subsequent turker had access to the output of all previous turkers. All tasks were requested at least 3 times, selection among alternatives was done by me, although it could have easily been done by Turkers themselves. In total, the contributions of 7 different Turkers were used in the poem above (while many more have worked on the HITs).

With that, I’ve initialized the poem algorithm with the acrostic “Infinite Monkey” and the title “In the daily life of a Mechanical Turk” and ran it on Mechanical Turk. The result can be seen above.

The Infinite Monkey acrostic refers to the Infinite Monkey Theorem:

“The Infinite Monkey Theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.” (Wikipedia)

That’s what we are trying to test here, in a less statistic and a more informed manner though. Instead of producing all possible poems, we are interested in producing constrained yet plausible poems, efficiently (i.e. in very few iterations).

Which leads to a variation of the Infite Monkey Theorem that I’d like to propose here:

The Finite Turker Theorem states that a finite (yet potentially large) number of independent writers (here: Mechanical Turkers) will almost surely produce a poem that is creative, enjoyable and mostly indistinguishable from a single author poem.

With the Finite Turker Theorem, and market places such as Mechanical Turk, it might be possible to outsource creative work – such as poem writing – to a large set of workers without much penalty in terms of beauty or enjoyability. Algorithms such as the one above can constraint and influence the resulting poems, giving greater control about the outcome of creative processes (which sounds like an oxymoron).

Because HITs were requested multiple times, there were several rejects that did not make it into the final poem, but which show some of the difficulties as well as the creative potential of programmed poems, including:


For I return and refresh and come back for more
Info, my pimp: [I’m] a Dolores Labs penny whore

Conclusion: It has been suggested that the primary use of Mechanical Turk is the execution of simple, easily replacable and often spam-related work. This little experiment suggests that Mechanical Turk can serve richer purposes, by tapping into the creative energy of an underestimated, underutilized but also (currently) underpaid work force.





A game-with-a-purpose based on Twitter

11 10 2010

I am happy to announce that my research group at TU Graz has launched Bulltweetbingo!, a game-with-a-purpose based on Twitter, today. The game is already live and available at http://bingo.tugraz.at. For an introduction to the idea of Buzzword Bingo, please see the following IBM commercial (Youtube video).

 

IBM Innovation Buzzword Bingo (Youtube)

 

Rather than playing buzzword bingo while listening to a talk, the idea of Bulltweetbingo! is to play Buzzword Bingo with the people you follow on Twitter. All people you follow on Twitter automatically participate in the game by tweeting. A Bulltweetbingo game terminates (i.e. hits “Bingo!”) if the people you follow on Twitter use a particular combination of the defined buzzwords in their tweets. We intend to use the data provided by each game in our research on analzying the semantics of short messages on systems such as Twitter or Facebook. Each game provides information about the relevance and topics of tweets for a particular person as well as some information on the topics of tweets that a person expects to receive in the future.

I’m copy’n pasting some more information about the game that we have made available on the game website  (about the project).

Bulltweetbingo!
Playing a game of bingo with people you follow on Twitter.

A team of researchers from Graz University of Technology, Austria has developed one of the first games-with-a-purpose that is exclusively based on Twitter.

The goal of this project is to annotate and to better understand the short messages posted to so-called social awareness streams such as Twitter or Facebook. Using this data, the researchers aim to improve the ability of computers to effectively organize and make sense out of the sea of short messages available today.

Dr. Markus Strohmaier, Assistant Professor at the Knowledge Management Institute at Graz University of Technology, Austria explains: “While social awareness streams such as Twitter or Facebook have experienced significant popularity over the last few years, we know little about how to best understand, search and organize the information that is contained in them.”

To tackle this problem, the researchers have developed a game of Buzzword Bingo that users can play with people they follow on Twitter.

“With each game users play on our website, we will collect data that helps us develop more effective algorithms for better understanding this new kind of data” Dr. Markus Strohmaier says, “and in addition to that, we simply hope users would enjoy playing a game of Bingo on Twitter. Each game is unique and exciting in a sense that users generally don’t know what tweets people will publish during the course of a bingo game”.

The researchers have launched the site bulltweetbingo! and ask users to sign up and to play a game of Bingo with the people they follow on Twitter. Twitter users can sign up at http://bingo.tugraz.at.

The game was implemented by one of my talented students, Simon Walk – Make sure to hire him if you need a complex web project to be realized quickly and effectively!





On taxonomies, folksonomies, and tweetonomies

17 04 2010

Towards a Taxonomy of Meta-Desserts (by several_bees @flickr)

For centuries, taxonomies have been a tool for mankind to bring structure to the world. Taxonomies (wikipedia: “the practice and science of classification”) were developed in different fields of science, including – but not limited to – biology (e.g. taxonomies of animals) or library sciences (e.g. taxonomies of literature). Regardless of the particular domain of application, in most cases those taxonomies were developed by a selected few (e.g. librarians), and were used by many.

With the emergence of personal computers and file directories, the task of taxonomy development was brought to the masses. Suddenly everyone (i.e. every computer user) was in charge of developing, maintaining and transforming personal taxonomical structures in order to organize and (re-)find resources. While this development has led to a vast increase of personal taxonomies, it was only since del.icio.us has popularized tagging as a new form of resource organization that users’ personal taxonomies were exposed publicly. This has made it possible to aggregate a large number of personal taxonomies into collective taxonomic structures. The result of such aggregation has since then been refered to as folksonomies, i.e. an emergent structure collectively produced by a large number of users in a bottom-up manner.

In social awareness streams (pdf) such as Twitter of Facebook, users typically do not aim to classify or organize resources, but they engage in casual chatter and dialogue, ocassionally using syntax to coordinate communication (such as #hashtags or @replies). Taxonomic structures can be assumed to play a subordinate role for users of social awareness streams.

In a recent paper to be presented at the SemSearch Workshop at WWW2010 [1] however, we show that there exist latent conceptual structures – similar to taxonomies or folksonomies – in social awareness streams, and that we can acquire these structures through simple aggregation mechanisms.

Abstract: Although one might argue that little wisdom can be conveyed in messages of 140 characters or less, this paper sets out to explore whether the aggregation of messages in social awareness streams, such as Twitter, conveys meaningful information about a given domain. As a research community, we know little about the structural and semantic properties of such streams, and how they can be analyzed, characterized and used. This paper introduces a network-theoretic model of social awareness streams, a so-called “tweetonomy”, together with a set of stream-based measures that allow researchers to systematically de fine and compare di fferent stream aggregations. We apply the model and measures to a dataset acquired from Twitter to study emerging semantics in selected streams. The network-theoretic model and the corresponding measures introduced in this paper are relevant for researchers interested in information retrieval and ontology learning from social  awareness streams. Our empirical findings demonstrate that di fferent social awareness stream aggregations exhibit interesting di fferences, making them amenable for di fferent applications [1].

In the paper, we introduce the notion of tweetonomies, and a corresponding tri-partite model of social awareness streams that extends the existing model of folksonomies by accomodating user-generated syntax (such as slashtags and other emerging syntax) and thereby integrating the communicative nature of such streams.

In the figure below, we have applied the network-theoretic model of tweetonomies to acquire a semantic network of hashtags that could be used for a range of different purposes, such as for navigating social awareness streams or for recommendation problems.

A tweetonomy of hashtags, aquired from Twitter (with the help of Jan Poeschko, click for full image 2.6 MB)

Our work shows that tweetonomies are a far more complex structure than – for example – taxonomies or folksonomies. One reason for that observation lies in the dynamic and user-generated nature of its syntax, but also in the fact that tweetonomies accomodate a much richer language than the language used in social tagging systems (tweets vs tags).

The results of our work suggest that tweetonomies are a novel and promising concept, different from taxonomies and folksonomies where people engage in conscious acts of classification. Whether tweetonomies have the potential to bring order and structure to social awareness streams similar to the way folksonomies brought order to social tagging systems remains a question to be answered.

Update (May 5 2010): An interesting question that was raised during the presentation of the paper at the WWW’2010 workshop was whether it would be justified to introduce Tweetonomies as a new concept. In other words, are the structures that we observe on twitter not just a different form of folksonomies? I’d argue for the necessity of a new concept for the following reasons: While taxonomies and folksonomies emerge when users structure resources, tweetonomies emerge when users structure conversation. Because conversations are inherently different than resources (e.g. they are dynamic, and involve multiple users) the structures that emerge from social awareness streams (tweetonomies) can be expected to be different from the structures that emerge from social bookmarking systems (folksonomies). Whether this is really the case however needs to be investigated in future work.

References:

[1] C. Wagner, M. Strohmaier, The Wisdom in Tweetonomies: Acquiring Latent Conceptual Structures from Social Awareness Streams, Semantic Search 2010 Workshop (SemSearch2010), in conjunction with the 19th International World Wide Web Conference (WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010. (pdf)