This is a joint project with Yla Tausczik, inspired by Wordle. We wanted to do something similar, but have positions and colours of words be meaningful.
(February 2010) Here we are representing a dialogue in a way that uses the information of who said what.
These four images are the three presidential debates and one vice-presidential debate in 2008. The size of each word corresponds to the frequency with which it was used, with the colour and approximate y-position corresponding to the relative amount that the two candidates used the word. The words used more often by the democratic candidate are bluer and higher on the image, and those used more by the republican candidate redder and lower. The x-position corresponds to the time during the debate at which the word was used (averaged over all times at which the word was used).
Looking at the first image, of the first presidential debate for instance, we can see that the candidates spent the early part of the debate talking about the economy (note the cluster of "main", "street" and "wall" at the far left), and then later talked about national security and foreign policy. Note that Obama refers to McCain frequently, and McCain to Obama, but they very rarely use their own names. We removed the word "senator" from the list in the presidential debates, since both candidates refer to each other as "Senator", so it is an uninteresting most common word.
The second debate was similar to the first, with a little added "health" and "friends". Only in the third do we see "education", "abortion" and "plumber". The vice-presidential debate has many references to the presidential debate, although more for McCain than for Obama, and interestingly, "maverick" used more by Biden than by Palin.
And now for something completely different: the same technique applied to Monty Python's Dead Parrot sketch. It's easy to see that the customer talks more and with a greater variety of words than the shopkeeper.
This time with a longer document: the play Rosencrantz and Guildenstern Are Dead by Tom Stoppard. As with the debates and sketch, we ignore anything other than the two main speakers (Rosencrantz and Guildenstern of course).
The two characters often speak for each other, so in contrast to the debates, they both say "Rosencrantz" and "Guildenstern" about equally many times. The play begins with an improbable number of coin flips resulting in "heads", hence its equal second place size (with "dead", "good" and "king", behind only "death").