TextBlob is a library for natural language processing, or NLP. Natural language processing techniques can give us access to units and aspects of the language that underlie the text (like sentences, parts of speech, sentiment, etc.).
Of course, a computer can never really fully "understand" human language, so NLP techniques are always a little bit inaccurate. But often even inaccurate results can be "good enough."
I've pre-installed the TextBlob
library on the sandbox server. If you want to use it locally on your own computer, come see me for help!
TextBlob allows you to take some text and get the sentences inside of the text. Here's an example from an interactive interpreter session:
>>> from textblob import TextBlob >>> blob = TextBlob(open("poe.txt").read()) >>> for item in blob.sentences: ... print item.replace('\n', ' ') The "Red Death" had long devastated the country. No pestilence had ever been so fatal, or so hideous. Blood was its Avatar and its seal--the redness and the horror of blood. There were sharp pains, and sudden dizziness, and then profuse bleeding at the pores, with dissolution. The scarlet stains upon the body and especially upon the face of the victim, were the pest ban which shut him out from the aid and from the sympathy of his fellow-men. And the whole seizure, progress and termination of the disease, were the incidents of half an hour. But the Prince Prospero was happy and dauntless and sagacious. When his dominions were half depopulated, he summoned to his presence a thousand hale and light-hearted friends from among the knights and dames of his court, and with these retired to the deep seclusion of one of his castellated abbeys. This was an extensive and magnificent structure, the creation of the prince's own eccentric yet august taste. A strong and lofty wall girdled it in. This wall had gates of iron. The courtiers, having entered, brought furnaces and massy hammers and welded the bolts. They resolved to leave means neither of ingress nor egress to the sudden impulses of despair or of frenzy from within. The abbey was amply provisioned. With such precautions the courtiers might bid defiance to contagion. The external world could take care of itself. In the meantime it was folly to grieve, or to think. The prince had provided all the appliances of pleasure. There were buffoons, there were improvisatori, there were ballet-dancers, there were musicians, there was Beauty, there was wine. All these and security were within. Without was the "Red Death". It was towards the close of the fifth or sixth month of his seclusion, and while the pestilence raged most furiously abroad, that the Prince Prospero entertained his thousand friends at a masked ball of the most unusual magnificence.
Here's how the above example works. First, we import the TextBlob
class from the textblob
library with this line:
from textblob import TextBlob
(The from module import thing
syntax used above simply makes available a single item from the named module. If you use this syntax, you don't have to type the name of the module every time you want to reference the thing you've imported. Another example: if you wanted to import just the choice()
function from the random
module, you could write: from random import choice
. Then, when you wanted to use the function, you could just type choice()
instead of having to type random.choice()
)
On the second line above, we create a TextBlob
object, and pass in a string. I used the open()
function to read in the contents of a file, but you can put in there any expression that evaluates to a string. We assign the object to a variable blob
.
The blob
variable has a number of interesting methods and attributes. The .sentences
attribute is a list of sentences in the text. In the third line of the example above, we loop over the list of sentences and print them out.
We need to replace \n
with a space character, because even though TextBlob parses sentences from the text, it doesn't remove linebreaks.
>>> from textblob import TextBlob >>> blob = TextBlob(open("sea_rose.txt").read()) >>> for word in blob.words: ... print word Rose harsh rose marred and with stint of petals meagre flower thin spare of leaf more precious than a wet rose single on a stem you are caught in the drift Stunted with small leaf you are flung on the sand you are lifted in the crisp sand that drives in the wind Can the spice-rose drip such acrid fragrance hardened in a leaf
This example demonstrates the .words
attribute of TextBlob objects: it parses individual words from the text, taking into account punctuation (and not including the punctuation in the words).
TextBlob can calculate the "sentiment" of a sentence. "Sentiment" is a measurement of the emotional content of the sentence: the number is positive (between 0 and 1) if the sentence says something "good" and negative (between 0 and -1) if the sentence says something "bad."
You can access the sentiment of a sentence in TextBlob by looping over the .sentences
attribute of a TextBlob object, then checking the .sentiment.polarity
attribute of each item in the loop. The following example prints only those sentences from poe.txt
that have a positive sentiment (according to TextBlob):
>>> from textblob import TextBlob >>> blob = TextBlob(open("poe.txt").read()) >>> for item in blob.sentences: ... if item.sentiment.polarity > 0: ... print item.replace('\n', ' ') And the whole seizure, progress and termination of the disease, were the incidents of half an hour. But the Prince Prospero was happy and dauntless and sagacious. When his dominions were half depopulated, he summoned to his presence a thousand hale and light-hearted friends from among the knights and dames of his court, and with these retired to the deep seclusion of one of his castellated abbeys. This was an extensive and magnificent structure, the creation of the prince's own eccentric yet august taste. A strong and lofty wall girdled it in. It was towards the close of the fifth or sixth month of his seclusion, and while the pestilence raged most furiously abroad, that the Prince Prospero entertained his thousand friends at a masked ball of the most unusual magnificence.
And the following example prints only those sentences from poe.txt
that have a negative sentiment:
>>> from textblob import TextBlob >>> blob = TextBlob(open("poe.txt").read()) >>> for item in blob.sentences: ... if item.sentiment.polarity < 0: ... print item.replace('\n', ' ') The "Red Death" had long devastated the country. There were sharp pains, and sudden dizziness, and then profuse bleeding at the pores, with dissolution. The scarlet stains upon the body and especially upon the face of the victim, were the pest ban which shut him out from the aid and from the sympathy of his fellow-men.
A "noun phrase" is a kind of phrase you find in a sentence. It consists of a noun and all of that noun's "surrounding matter," such as any adjectives that modify the noun. TextBlob makes it very easy to extract noun phrases from a given text, using its .noun_phrases
attribute:
>>> from textblob import TextBlob >>> blob = TextBlob(open("poe.txt").read()) >>> for item in blob.noun_phrases: ... print item death blood avatar sharp pains sudden dizziness scarlet stains pest ban whole seizure prospero deep seclusion magnificent structure prince 's own eccentric august taste lofty wall massy hammers sudden impulses such precautions bid defiance external world death prospero unusual magnificence
Here we're looping over the noun phrases and printing them out.
TextBlob can also tell us what part of speech each word in a text corresponds to. It can tell us if a word in a sentence is functioning as a noun, an adjective, a verb, etc. In NLP, associating a word with a part of speech is called "tagging." Correspondingly, the attribute of the TextBlob
object we'll use to access this information is .tags
.
>>> from textblob import TextBlob >>> blob = TextBlob("I have a lovely bunch of coconuts.") >>> for word, pos in blob.tags: ... print word, pos I PRP have VBP a DT lovely JJ bunch NN of IN coconuts NNS
This for
loop is a little weird, because it has two temporary loop variables instead of one. (The underlying reason for this is that .tags
evaluates to a list of two-item tuples, which we can automatically unpack by specifying two items in the for loop. Don't worry about this if it doesn't make sense. Just know that when we're using the .tags
attribute, you need two loop variables instead of one.) The first variable, which we've called word
here, contains the word; the second variable, called pos
here, contains the part of speech.
_
gets replaced with various letters depending on the form of the verb)>>> from textblob import Word >>> w = Word("university") >>> print w.pluralize() universities
The .lemmatize()
returns the word, but with all morphology (suffixes, etc.) removed.
>>> from textblob import Word >>> w = Word("running") >>> print w.lemmatize() running
from textblob import TextBlob import random import sys # stdin's read() method just reads in all of standard input as a string; # use the decode method to convert to ascii (textblob prefers ascii) text = sys.stdin.read().decode('ascii', errors="replace") blob = TextBlob(text) short_sentences = list() for sentence in blob.sentences: if len(sentence.words) <= 5: short_sentences.append(sentence.replace("\n", " ")) for item in random.sample(short_sentences, 10): print item
$ python hemingwayize.py < austen.txt How will a conundrum reckon?" Could there be finer symptoms? what do you mean?" replied Elinor. "Oh! Adopt her, educate her." but what shall you do? cried Harriet, colouring, and astonished. "I had none. who can require it?"
from textblob import TextBlob import sys import random text = sys.stdin.read().decode('ascii', errors="replace") blob = TextBlob(text) noun_phrases = blob.noun_phrases verbs = list() for word, tag in blob.tags: if tag == 'VB': verbs.append(word.lemmatize()) for i in range(1, 11): print "Step " + str(i) + ". " + random.choice(verbs).title() + " " + \ random.choice(noun_phrases)
$ python instructify.py < poe.txt Step 1. Take prince 's Step 2. Leave lofty wall Step 3. Leave prospero Step 4. Take pest ban Step 5. Close deep seclusion Step 6. Take massy hammers Step 7. Take external world Step 8. Leave sudden dizziness Step 9. Leave sudden impulses Step 10. Close pest ban
from textblob import TextBlob, Word import sys import random text = sys.stdin.read().decode('ascii', errors="replace") blob = TextBlob(text) nouns = list() for word, tag in blob.tags: if tag == 'NN': nouns.append(word.lemmatize()) print "This text is about..." for item in random.sample(nouns, 5): word = Word(item) print word.pluralize()
$ python summarize_poorly.py < poe.txt This text is about... ingress bodies halves victims walls