What is the size of the Library of Twitter?

22 02 2011

The Library of Babel is a theoretical library that holds the sum of all books that can be written with (i) a given set of symbols and (ii) a given page limit. According to Wikipedia, the Library of Babel is based on a short story by the author and librarian Jorge Luis Borges (1899–1986). Its idea is simple: the library holds all books that can be produced by every combinatorially possible sequence of symbols up to a certain book length. In Jorge Luis Borges case, the Library is immensly large since it contains all possible books up to 410 pages. The American Scientist calculates:

… each book has 410 pages, with 40 lines of 80 characters on each page. Thus a book consists of 410 [pages] × 40 [lines] × 80 [characters] = 1,312,000 symbols. There are 25 choices for each of these symbols, and so the library’s collection consists of 251,312,000 books.

But what is the size of a Library of Twitter, i.e. the size of the set of all theoretically possible tweets? It should be (i) much smaller and (ii) much easier to calculate due to the particular structure of tweets. Here’s a brief back-of-the-envelope calculation:

Given the 140 character limit of tweets, and assuming an english vocabulary of 26 symbols expanded by basic syntactical elements such as punctuation (.), commas (,), spaces ( ), at signs (@), hashs (#) and a few others, we end up with 140 characters and all combinatorially possible sequences of a vocabulary of maybe 50 symbols. Based on these (conservative) assumptions, the Library of Twitter holds at least  50140 tweets.

In other words, the size of the Library of Twitter is at least 7.17 × 10237 [1] or:


While this number seems impressive, it pales in comparison to the size of the Library of Babel (which is 1.956 × 101834097). As with the Library of Babel, most of the Library of Twitter contents would be non-sensical. But on the upside, the library would also contain all tweets ever written in the past and all theoretically possible tweets to be written in the future. Thereby, 50140 is an upper bound on the information that can be conveyed in 140 characters given a vocabulary of 50 symbols [2]. This first approximate upper bound should be informative for future studies of Twitter to answer questions such as: How many of the theoretically possible tweets have already been written – or in other words – how much is there left to write before we run out of (sensical) combinatorial options?

I’ll leave it to somebody else to calculate the number of bits and hard drives necessary to store, mine and search the Library of Twitter.

[1] all numbers calculated with WolframAlpha
[2] It is obvious that larger assumed vocabularies would significantly increase the size of the library.



9 responses

23 02 2011
Jorge Aranda

An important point in Borges’ story is that it’s told from the point of view of one inhabitant of the library, who doesn’t necessarily know the whole story. He narrates that there is a theory going around that the library contains every possible book, only once, and that therefore the library is finite. Another theory claims that the library’s contents repeat themselves, the library being an infinite Universe—if I recall correctly, some librarians spend their lives looking for proof of this, unsuccessfully.

23 02 2011
Markus Strohmaier

Hi Jorge! looking for proof of what? whether the library in the story is infinite?

23 02 2011
Jorge Aranda

Yeah, kind of. If they find two identical books, that refutes the theory that there’s only one copy of each, and nothing stops us from thinking then that the library is infinite. Some people instead choose to just walk in one direction, hoping to find the edge of the library at some point. But they haven’t been successful. Ultimately the story is an alegory of inquiry in a Universe that paradigmatically represents inquiry—a library.

Mind you, this is all from memory, and I’m not sure it’s entirely reliable :-)

24 02 2011
Markus Strohmaier

thanks for clarifying – makes me want to go & read the story!

27 02 2011
Jorge Aranda

Found it! I can’t vouch for the translation though:

28 02 2011
Markus Strohmaier


17 04 2012

https://twitter.com/#!/Total_Tweets this is a twitter account that is posting all possible tweets. It says the estimated completion date is 4.51491694 X 10^274 AD!

29 11 2012

shouldn’t it be 50^140 + 50^139 + 50^138 + 50^137….

29 11 2012

BTW, forgot to mention – great post! :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: