Twitter is a rich source for procedural poetry. The text found on Twitter is constantly updating, ever-changing, and reflects the thoughts and points of view of millions of people around the world. As such, it's a very different source of text than the static texts we've been working with so far in class. It has different affordances and different opportunities.
Twitter makes available an "API", or "application programming interface," to developers who want to access the text on Twitter in their own computer programs. We're going to learn how to do two things with the Twitter API: first, we'll learn how to search for text on Twitter; then we'll learn how to read a particular user's timeline.
So what do we mean by "application programming interface"? Well, consider Twitter's normal search interface (which you should play around with a bit, if you haven't already). It looks a bit like this:
The search interface allows us to find tweets whose contents match a particular string---sort of like a big grep
for Twitter. Nice! But what if we wanted to make procedural poetry from those tweets? We'd need to find some way of getting them into our Python program.
You might think of a few solutions to this problem, like cutting-and-pasting the text of the tweets one-by-one into a text file. This is a fine solution, but it's very tedious, and limits the scale of what we can do with Twitter! The benefit of using Twitter, after all, is that we have access to billions of tweets, and we don't necessarily want our procedure to be limited by the amount of human labor we can expend in cutting-and-pasting tweets.
Fortunately, there's an easier way: Twitter provides a special version of the search interface that is just for computers. Instead of returning a web page with the search results, this version of the search interface returns a dictionary data structure with information about all of the tweets (including their text). The data structure is designed to be easily computer-readable, and it looks like this:
{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [ ], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [ ] }, "in_reply_to_user_id_str": null, "contributors": null, "text": "Aggressive Ponytail #freebandnames", "metadata": { "iso_language_code": "en", "result_type": "recent" }, "retweet_count": 0, "in_reply_to_status_id_str": null, "id": 250075927172759552, "geo": null, "retweeted": false, "in_reply_to_user_id": null, "place": null, "user": { "profile_sidebar_fill_color": "DDEEF6", "profile_sidebar_border_color": "C0DEED", "profile_background_tile": false, "name": "Sean Cummings", "profile_image_url": "http://a0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg", "created_at": "Mon Apr 26 06:01:55 +0000 2010", "location": "LA, CA", "follow_request_sent": null, "profile_link_color": "0084B4", "is_translator": false, "id_str": "137238150", "entities": { "url": { "urls": [ { "expanded_url": null, "url": "", "indices": [ 0, 0 ] } ] }, "description": { "urls": [ ] } }, "default_profile": true, "contributors_enabled": false, "favourites_count": 0, "url": null, "profile_image_url_https": "https://si0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg", "utc_offset": -28800, "id": 137238150, "profile_use_background_image": true, "listed_count": 2, "profile_text_color": "333333", "lang": "en", "followers_count": 70, "protected": false, "notifications": null, "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png", "profile_background_color": "C0DEED", "verified": false, "geo_enabled": true, "time_zone": "Pacific Time (US & Canada)", "description": "Born 330 Live 310", "default_profile_image": false, "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png", "statuses_count": 579, "friends_count": 110, "following": null, "show_all_inline_media": false, "screen_name": "sean_cummings" }, "in_reply_to_screen_name": null, "source": "Twitter for Mac", "in_reply_to_status_id": null }, { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Fri Sep 21 23:40:54 +0000 2012", "id_str": "249292149810667520", "entities": { "urls": [ ], "hashtags": [ { "text": "FreeBandNames", "indices": [ 20, 34 ] } ], "user_mentions": [ ] }, "in_reply_to_user_id_str": null, "contributors": null, "text": "Thee Namaste Nerdz. #FreeBandNames", "metadata": { "iso_language_code": "pl", "result_type": "recent" }, "retweet_count": 0, "in_reply_to_status_id_str": null, "id": 249292149810667520, "geo": null, "retweeted": false, "in_reply_to_user_id": null, "place": null, "user": { "profile_sidebar_fill_color": "DDFFCC", "profile_sidebar_border_color": "BDDCAD", "profile_background_tile": true, "name": "Chaz Martenstein", "profile_image_url": "http://a0.twimg.com/profile_images/447958234/Lichtenstein_normal.jpg", "created_at": "Tue Apr 07 19:05:07 +0000 2009", "location": "Durham, NC", "follow_request_sent": null, "profile_link_color": "0084B4", "is_translator": false, "id_str": "29516238", "entities": { "url": { "urls": [ { "expanded_url": null, "url": "http://bullcityrecords.com/wnng/", "indices": [ 0, 32 ] } ] }, "description": { "urls": [ ] } }, "default_profile": false, "contributors_enabled": false, "favourites_count": 8, "url": "http://bullcityrecords.com/wnng/", "profile_image_url_https": "https://si0.twimg.com/profile_images/447958234/Lichtenstein_normal.jpg", "utc_offset": -18000, "id": 29516238, "profile_use_background_image": true, "listed_count": 118, "profile_text_color": "333333", "lang": "en", "followers_count": 2052, "protected": false, "notifications": null, "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/9423277/background_tile.bmp", "profile_background_color": "9AE4E8", "verified": false, "geo_enabled": false, "time_zone": "Eastern Time (US & Canada)", "description": "You will come to Durham, North Carolina. I will sell you some records then, here in Durham, North Carolina. Fun will happen.", "default_profile_image": false, "profile_background_image_url": "http://a0.twimg.com/profile_background_images/9423277/background_tile.bmp", "statuses_count": 7579, "friends_count": 348, "following": null, "show_all_inline_media": true, "screen_name": "bullcityrecords" }, "in_reply_to_screen_name": null, "source": "web", "in_reply_to_status_id": null }, ], "search_metadata": { "max_id": 250126199840518145, "since_id": 24012619984051000, "refresh_url": "?since_id=250126199840518145&q=%23freebandnames&result_type=mixed&include_entities=1", "next_results": "?max_id=249279667666817023&q=%23freebandnames&count=4&include_entities=1&result_type=mixed", "count": 4, "completed_in": 0.035, "since_id_str": "24012619984051000", "query": "%23freebandnames", "max_id_str": "250126199840518145" } }
...okay, so "readable" doesn't seem like a good word to describe that, "computer-" or no. But hopefully you can recognize the contours of what you see above: it looks a bit like a Python dictionary, with some lists inside it, and some of those lists contain other dictionaries. Inside these dictionaries and lists are all of the pieces of information we're looking for: the task is just to figure out how to write Python expressions that access those bits of information.
We're not going to talk about ALL of the information in this data structure; we're just going to look at a few patterns for getting the most interesting parts.
But first...
In order to use the Twitter API, you can't just use your normal username and password. Instead, you need four magical strings. We're not even going to discuss what these strings are, or what their names mean; for now, just know that these strings, together, act as a sort of "password" for the Twitter API.
The four magical strings are called:
In order to obtain these four magical strings, we need to...
This site has a good overview of the steps you need to perform in order to create a Twitter application. I'll demonstrate the process in class. You'll need to have already signed up for a Twitter account!
To access the Twitter API, we're going to use a Python library called Twython. I've already installed this library on the sandbox machine. (If you want to use this library on your own computer, come see me and I'll help you out.)
Here's a simple program that makes use of the Twitter API using Twython. There's a lot of strange stuff here, so don't be worried if some of it is confusing at first. I'll talk about the parts of the program that you can change below.
import sys import twython api_key, api_secret, access_token, token_secret = sys.argv[1:] twitter = twython.Twython(api_key, api_secret, access_token, token_secret) query = "sea rose" response = twitter.search(q=query, result_type="recent", count=20) for tweet in response['statuses']: print tweet['text']
This program performs a Twitter search for whatever string is stored in the query
variable. It then prints out the text of all matching tweets. Run it like so, replacing $API_KEY
with your API key, $API_SECRET
with your API secret, $ACCESS_TOKEN
with your access token, and $TOKEN_SECRET
with your access token secret:
$ python twitter_search.py $API_KEY $API_SECRET $ACCESS_TOKEN $TOKEN_SECRET RT @evepaludan: £0.77 THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/FiuRJlJZgg◄ RT http:… RT @lisa_blake4: Lava rolling into the sea creating steam that rose so fast, it spawned multiple vortices! Photo by Bruce Omori. http://t.c… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… @naoscifra ESPERO QUE SEA MUY NOTORIO, FANSERVICE POR FAVOR Sea dog skeleton from the Mary Rose, 1545. Portsmouth Historic Dockyard http://t.co/joLKQOMpMt It's times like these when I really need some Barnegat Sea Scallops with a glass of Louis Larent Rose D'Anjou. Russian photographer Alexander Semenov spends a lot of his time under the sea, capturing the alien-like beauty of... http://t.co/ALblkKRRi1 RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http:… RT @evepaludan: £0.77 THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/FiuRJlJZgg◄ RT http:… RT @evepaludan: £0.77 THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/FiuRJlJZgg◄ RT http:… £0.77 THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/FiuRJlJZgg◄ RT http://t.co/tcJOHlGfrb #99cents THE MAN WHO ROSE FROM THE SEA (#Angel #Detectives #2) #fantasy #timetravel #romance ►http://t.co/BchpcDqhlE◄ http://t.co/KwH9C17V7o #beauty #1: Ailiseu 100g Bath Dead Sea Salt - Champagne & Rose: Ailiseu 100g Bath Dead Sea Salt - Champagne & ... http://t.co/u1aSgah2nU
You should get a list of tweets that look like they contain either the word sea
or the word rose
.
Let's break down this example a little bit, to show what each line is doing.
import twython
This line "imports" the Twython library and makes it available in the program.
api_key, api_secret, access_token, token_secret = sys.argv[1:]
This line reads the Twitter credentials from the command-line, using the sys.argv
list.
twitter = twython.Twython(api_key, api_secret, access_token, token_secret)
This line "initializes" the library, and creates an object that gets assigned to a variable (called twitter
here, but you could call it whatever). We'll primarily be interacting with the Twitter API by calling methods on this object.
response = twitter.search(q=query, result_type="recent", count=20)
This is where the work of actually contacting the Twitter API happens. The .search()
method opens up an Internet connection to Twitter's search server. The parameters in the method call have particular meanings:
q=query
: search for whatever string is in the variable query
result_type="recent"
: tells Twitter to return the most recent tweets matching the query (instead of returning the most "popular" tweets, which is the default)count=20
: tells Twitter to return 20 results. (This can be, at most, 100).There are other parameters you can pass to this function, which you can read about here. These three, though, should be more than enough to get you started.
The .search()
method evaluates to a Python dictionary, stored in the example above in a variable called response
. You can print
this variable if you'd like to see exactly what's in it, but the main item of interest is the key statuses
, which is a list of dictionaries. From a high-level perspective, the structure looks something like this:
{
'statuses': [
{
'text': 'tweet text!',
'retweeted': False,
'id_str': '123456789',
[...other key/value pairs omitted...]
'user': {
'name': 'Fordham English',
'screen_name': 'FordhamEnglish',
[...other key/value pairs omitted...]
}
},
{
'text': 'another tweet!',
'retweeted': False,
'id_str': '123456788',
[...other key/value pairs omitted...]
'user': {
'name': 'Fordham English',
'screen_name': 'FordhamEnglish',
[...other key/value pairs omitted...]
}
}
]
[...other stuff omitted...]
}
That is: it's a dictionary, one of whose keys (statuses
) has a list as its value. That list itself contains other dictionaries, whose key/value pairs describe information about individual tweets. Those dictionaries have yet another dictionary embedded inside them---specifically, the value for the user
key, which is a dictionary containing information about the user who made the tweet.
So this line...
for tweet in response['statuses']:
... causes the program to loop over each "status" in the list. We're calling the temporary loop variable tweet
, to emphasize that the information we're looking at is about a tweet. The variable tweet
itself will contain each item of the list in succession. And each item in the list is a dictionary!
So, the line...
print tweet['text']
...will display the value for the key text
, which contains the text of the tweet in question.
(to be continued!)
import sys import twython api_key, api_secret, access_token, token_secret = sys.argv[1:] twitter = twython.Twython(api_key, api_secret, access_token, token_secret) source_words = ["rose", "harsh", "rose", "marred", "and", "with", "stint", "of", "petal"] for word in source_words: response = twitter.search(q=word, result_type="recent", count=1) if len(response['statuses']) > 0: first_tweet = response['statuses'][0] tweet_text = first_tweet['text'].lower() if word in tweet_text: pos = tweet_text.find(word) print tweet_text[pos:]
$ python twitter_elaborate.py $API_KEY $API_SECRET $ACCESS_TOKEN $TOKEN_SECRET <frost.txt harsh. hahahaha rose has ‘got to get out there and play’ http://t.co/kbqpan02rm marred mind tho, no vex rt @jackdre02: wu asked for ur opinion?“druhgzz_: you myt be the enemy (cont) http://t.co/ptwvpr5kg6 and i'm feel like i am the one who have convocation today. haha with anything that may work for you. stint is over. got stuck under a truck after an hour of not saving - of you non-violence preachers when people were being tear gassed and hit with rubber bullets for peacefully assembling? #foh petal paisley http://t.co/2lcrseqrqr