So far, all of our programs have operated on one source of input: standard input from the UNIX command-line. In this lesson, we'll learn how to use Python to read text from multiple files in the same program, allowing us to mash up text from more than one source.
The key to working with multiple files is Python's
open() function. The
open() function, in its simplest form, looks like this:
file_name_str is some expression that evaluates to a string that names the file that you want to open. (I.e., if you wanted to work with text in a file called
foo.txt, you would write
open() function evaluates to a value of type
file. You can read more about what you can do with
file values if you'd like, but in this lesson I'm going to show you a few patterns to use with
open() that you can just drop into your own code.
The first thing you can do with
open() is iterate over it in a
for loop, the same way you would iterate over a list. Here's an example, which does the same work of a normal standard-input program, but using an explicitly named file instead:
for line in open("sea_rose.txt"): line = line.strip() print line
Run this program (making sure you have a file named
sea_rose.txt in the same directory) and you'll get the following output:
$ python open_file.py Rose, harsh rose, marred and with stint of petals, meagre flower, thin, spare of leaf, more precious than a wet rose single on a stem -- you are caught in the drift. Stunted, with small leaf, you are flung on the sand, you are lifted in the crisp sand that drives in the wind. Can the spice-rose drip such acrid fragrance hardened in a leaf?
Notice one thing about the command line: we didn't include input from redirection! (i.e., there's no
<file.txt) That's because we didn't include our standard
sys.stdin loop in the script---all of the text is read in by the
open() command instead.
We can include as many calls to
open() in our program as we'd like. And--- of course---we can use the
for loop to do things other than just
sea_rose.txt, and performs an unusual juxtaposition:
import random rose_lines = list() for line in open('sea_rose.txt'): line = line.strip() if len(line) > 0: rose_lines.append(line) frost_lines = list() for line in open('frost.txt'): line = line.strip() if len(line) > 0: frost_lines.append(line) for i in range(10): random_rose = random.choice(rose_lines) random_frost = random.choice(frost_lines) print random_rose[:len(random_rose)/2] + random_frost[len(random_frost)/2:]
Here's the output:
$ python halfsies.py in the cly about the same, you are flungsy and wanted wear; in the cr, as just as fair, hardened the better claim, hardened e as far as I could you areall the difference. hardened n the undergrowth; Can the sall the difference. hardened the passing there marred and with n the undergrowth;
This program reads in all of the lines from two files (
frost.txt), and puts the lines into separate lists (
frost_lines, respectively). It then executes some code at the end of the program ten times: choosing a random line from Sea Rose, a random line from Frost, and then printing out half of the Sea Rose line next to half of the Frost line.
EXERCISE: Write a version of
halfsies.pythat prints out half of the words from the randomly selected line from Sea Rose, followed by half of the words from the randomly selected line from Frost.
open() function also allows us to slurp up all of a file at once into a big string. Here's how to do it:
file_name_str is some expression that evaluates to a string that names a file. The entire expression above evaluates to a string. Let's test it out in the interactive interpreter:
>>> open('sea_rose.txt').read() 'Rose, harsh rose, \nmarred and with stint of petals, \nmeagre flower, thin, \nspare of leaf,\n\nmore precious \nthan a wet rose \nsingle on a stem -- \nyou are caught in the drift.\n\nStunted, with small leaf, \nyou are flung on the sand, \nyou are lifted \nin the crisp sand \nthat drives in the wind.\n\nCan the spice-rose \ndrip such acrid fragrance \nhardened in a leaf?\n'
What can we do with the entire file in one big string? Well, we can grab big chunks of it for one thing, and make a kind of glitchy mashup of two different files:
import random # read file contents into strings sea_rose = open('sea_rose.txt').read() frost = open('frost.txt').read() for i in range(10): rose_start = random.randrange(len(sea_rose)) rose_length = random.randrange(8, 20) rose_fragment = sea_rose[rose_start:rose_start+rose_length] frost_start = random.randrange(len(frost)) frost_length = random.randrange(8, 20) frost_fragment = frost[frost_start:frost_start+frost_length] print rose_fragment + frost_fragment
Here's the output:
$ python glitch.py stem -- you are cby, And that has m you areelled by, An -- you I--- I took th ls, meagre flot travel both spice-rose drip se other, as marred agh Somewher such acrid fragrawing how way le ose drip such acd down one you are er come bac are of leafdifference.
Reading in the contents of a file as a string also allows us to easily extract all of the words from the text. We can use that property to write a program that produces output that contains words from two different files:
import random words = list() sea_rose = open('sea_rose.txt').read() frost = open('frost.txt').read() for item in sea_rose.split(): words.append(item) for item in frost.split(): words.append(item) for i in range(10): num_words_this_line = random.randrange(1, 8) words_this_line = random.sample(words, num_words_this_line) print ' '.join(words_this_line)
And the output:
$ python word_mashup.py it had drip Then the and the there bent come I--- first a Rose, and And really travel bent day! acrid in And Yet about sand, I -- as in
The program above used the
.split() method in a new way---we didn't pass a string inside the parentheses. It turns out that
.split(), when used without any parameters, does something interesting: it splits the string up on any whitespace (space characters, tabs, new lines). This is a little bit more versatile, especially when we're working with big strings that have newline characters in them.
Here's the difference between
.split(" ") and
.split(), illustrated in the interactive interpreter. First, we'll make a string with a bunch of weird whitespace in it:
>>> original = "This is\na test\n\ta very lovely test" >>> print original >>> print original.split(" ") >>> print original.split() This is a test a very lovely test ['This', 'is\na', 'test\n\ta', 'very', 'lovely', 'test'] ['This', 'is', 'a', 'test', 'a', 'very', 'lovely', 'test']
What does it look like when we split on
>>> original = "This is\na test\n\ta very lovely test" >>> print original.split(" ") ['This', 'is\na', 'test\n\ta', 'very', 'lovely', 'test']
It treats units like
is\na like one unit---not ideal! If we use
.split() with no parameters instead:
>>> original = "This is\na test\n\ta very lovely test" >>> print original.split() ['This', 'is', 'a', 'test', 'a', 'very', 'lovely', 'test']
The program below reads in one file (
poe.txt) and creates a list from its words. It then reads in a second file (
frost.txt) and iterates over it line by line, replacing a randomly chosen word in the line with another word from
import random poe_string = open("poe.txt").read() poe_words = poe_string.split() for line in open("frost.txt"): line = line.strip() if len(line) == 0: print line else: line_words = line.split() random_poe_word = random.choice(poe_words) random_frost_word = random.choice(line_words) line = line.replace(random_frost_word, random_poe_word) print line
Here's what it looks like when you run it:
$ python replacer.py appliances roads diverged in a yellow wood, And dissolution. I could not travel both ingress be one traveler, long I stood And looked the one as far as I could To where it bent of the undergrowth; Then took the and as just as fair, And having perhaps the better out Because it was grassy and such wear; Though as the that the passing there Had worn shutm really about shut same, And both that courtiers, equally lay In leaves no step had girdled black. his I kept the first for another day! Yet knowing how way leads on to But I doubted if I should blood. come back. I shall sympathy telling this with a sigh Somewhere ages A ages hence: Two roads diverged abroad, a wood, and I--- I took the close less travelled by, And The has made all the difference.
EXERCISE: Rewrite the program so that it uses something other than the
.replace()method, and replaces random words rather than matching strings. (Hint: you'll need to
.split()the line from
open() is pretty rad! Why not use it ALL the time, instead of bothering with
sys.stdin? Well, there are a couple of reasons:
open()means that your program can't interoperate with other UNIX programs, at least in terms of where it gets its input---if you're reading directly from a file, instead of reading from
sys.stdin, you won't be able to use a pipe
|to send input to your program from another program.
open(), you need to put the filename in your program somehow, which means that if you ever want to make the program work with a different file, you have to modify the program itself. This is inconvenient (though there are ways of working around it; see below).
It's really a trade-off:
open() allows you the flexibility of being able to work with multiple sources of input, but doesn't interoperate well with other programs. On the other hand,
sys.stdin limits you to one source of input, but that source of input can be anything---a file (using redirection), or another UNIX program (using pipes).
Occasionally, it can make sense to use both
sys.stdin in the same file. Take, for example, this program, which prints out any lines in standard input that have words from
frost.txt with a length of six or greater in them:
import sys # read in a string with everything from frost.txt frost_str = open('frost.txt').read() # create an empty list frost_words =  # iterate over each word in frost_str; check to see if the word is of length # equal to or greater than 6; add to the list if so for word in frost_str.split(): if len(word) >= 6: frost_words.append(word) # loop over every line in stdin for line in sys.stdin: line = line.strip() # set found to false on each iteration found = False # check for each word in frost_words: is it found in the line? if so, set # found to True for word in frost_words: if word in line: found = True # after all that, if found is True, print the line. if found: print line
Run this program, using
sonnets.txt (for example) as input:
$ python frostify.py <sonnets.txt But as the riper should by time decease, Now is the time that face should form another; For having traffic with thy self alone, Sap checked with frost, and lusty leaves quite gone, That's for thy self to breed another thee, Then what could death do if thou shouldst depart, And having climb'd the steep-up heavenly hill, From his low tract, and look another way: In singleness the parts that thou shouldst bear. Mark how one string, sweet husband to another, Resembling sire and child and happy mother, Which to repair should be thy chief desire. Make thee another self for love of me, If all were minded so, the times should cease Which bounteous gift thou shouldst in bounty cherish: Thou shouldst print more, not let that copy die. When lofty trees I see barren of leaves, Against this coming end you should prepare, So should that beauty which you hold in lease When your sweet issue your sweet form should bear. So should the lines of life that life repair, Though yet heaven knows it is but as a tomb So should my papers, yellow'd with their age, You should live twice,--in it, and in my rhyme. Then look I death my days should expiate. Great princes' favourites their fair leaves spread The dear respose for limbs with travel tir'd; To march in ranks of better equipage: But since he died and poets better prove, Full many a glorious morning have I seen And make me travel forth without my cloak, Though thou repent, yet I have still the loss: Though in our lives a separable spite, Lest my bewailed guilt should do thee shame, When thou art all the better part of me? Both find each other, and I lose both twain, Injurious distance should not stop my way; Or heart in love with sighs himself doth smother, When what I seek, my weary travel's end, From where thou art why should I haste me thence? Then should I spur, though mounted on the wind, Thy edge should blunter be than appetite, Being your slave what should I do but tend, Though you do anything, he thinks no ill. I should in thought control your times of pleasure, Wh'r we are mended, or wh'r better they, Is it thy will, thy image should keep open Dost thou desire my slumbers should be broken, Hath travell'd on to age's steepy night; Ah! wherefore with infection should he live, That sin by him advantage should achieve, Why should false painting imitate his cheek, Why should poor beauty indirectly seek Why should he live, now Nature bankrupt is, Ere beauty's dead fleece made another gay: Making no summer of another's green, Then thou alone kingdoms of hearts shouldst owe. If thinking on me then should make you woe. When I perhaps compounded am with clay, Lest the wise world should look into your moan, O! lest the world should task you to recite What merit lived in me, that you should love And so should you, to love things nothing worth. When yellow leaves, or none, or few, do hang My spirit is thine, the better part of me: Then better'd that the world may see my pleasure: So is my love still telling what is told. These vacant leaves thy mind's imprint will bear, Knowing a better spirit doth use your name, Though I, once gone, to all the world must die: Some fresher stamp of the time-bettering days. In true plain words, by thy true-telling friend; And their gross painting might be better us'd Which should example where your equal grew. Though words come hindmost, holds his rank before. I was not sick of any fear from thence: Thy self thou gav'st, thy own worth then not knowing, Comes home again, on better judgement making. As I'll myself disgrace; knowing thy will, Lest I, too much profane, should do it wrong, All these I better in one general best. Thy love is better than high birth to me, And having thee, of all men's pride I boast: I see a better state to me belongs That in thy face sweet love should ever dwell; Thy looks should nothing thence, but sweetness tell. Though to itself, it only live and die, That leaves look pale, dreading the winter's near. One blushing shame, another white despair; Because he needs no praise, wilt thou be dumb? Because I would not dull you with my song. That having such a scope to show her pride, Three beauteous springs to yellow autumn turn'd, One thing expressing, leaves out difference. And for they looked but with divining eyes, Though absence seem'd my flame to qualify, Like him that travels, I return again; These blenches gave my heart another youth, That did not better for my life provide My most full flame should afterwards burn clearer. Wherein I should your great deserts repay, Which should transport me farthest from your sight. That better is, by evil still made better; 'Tis better to be vile than vile esteem'd, For why should others' false adulterate eyes That every tongue says beauty should look so. Whilst my poor lips which should that harvest reap, Had, having, and in quest, to have extreme; One on another's neck, do witness bear And truly not the morning sun of heaven Though in thy store's account I one must be; Why should my heart think that a several plot, If I might teach thee wit, better it were, Though not to love, yet, love to tell me so;-- For, if I should despair, I should grow mad, Who leaves unsway'd the likeness of a man, The better angel is a man right fair, Tempteth my better angel from my side, I guess one angel in another's hell: Why so large cost, having so short a lease, Lest eyes well-seeing thy foul faults should find. With others thou shouldst not abhor my state:
The program above is tricky! Read it carefully. Here's how to get a handle on the tricky parts.
frost_wordsvariable. What type of Python value is it (string, list, integer)? What does it have in it after the very first
ifstatement that checks for word length. What happens to the output of the program?
found) to keep track of whether or not a word has been found in a line? Try re-writing this program so it doesn't use the boolean, and instead just
forloop. What happens?
EXERCISE: (advanced!) Modify the program above so that, instead of printing every matching line, it prints every instance of a matching string, along with ten characters of surrounding context (i.e., the ten characters before the match, and the ten characters afterward).
Many UNIX utilities take arguments on the command line: grep takes a pattern to search for, for example. We can read command-line parameters from Python as well, using the
sys.argv list. This list contains all of the parameters passed on the command line, including the same of the script itself.
For example, take the following script, called argv_reader.py:
import sys for arg in sys.argv: print arg
$ python argv_reader.py anteater bonobo cockatoo argv_reader.py anteater bonobo cockatoo
The element at index 0 is always the name of the Python program. The elements afterward are whatever strings are typed on the command-line. Handy! Here's a version of
glitch.py above that reads from two filenames that you can specify on the command-line, instead of being hard-coded in the file itself:
import random import sys # read file contents into strings left_file = open(sys.argv).read() right_file = open(sys.argv).read() for i in range(10): left_start = random.randrange(len(left_file)) left_length = random.randrange(8, 20) left_fragment = left_file[left_start:left_start+left_length] right_start = random.randrange(len(right_file)) right_length = random.randrange(8, 20) right_fragment = right_file[right_start:right_start+right_length] print left_fragment + right_fragment
I chose the words "left" and "right" arbitrarily---they don't have a special meaning here. Running the program:
$ python glitch-argv.py frost.txt sonnets.txt to way, I doubty feeding; And y In leaves orm happy show To t the same, Ande unear'd womb D ges and age debarre'd the bene ar as I coul to his s at morninat all the as for that the paars not polic equally lay In may, yet e, And both ow, They liv s made all the diffh Askan
EXERCISE: Rewrite any of the other examples in this lesson that use
sys.argvinstead of a hard-coded filename.