-
Notifications
You must be signed in to change notification settings - Fork 3
/
Automate-Boring-Stuff-with-Python.html
1638 lines (1634 loc) · 70.9 KB
/
Automate-Boring-Stuff-with-Python.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, user-scalable=yes"
/>
<title>Automate-Boring-Stuff-with-Python</title>
<style type="text/css">
code {
white-space: pre-wrap;
}
span.smallcaps {
font-variant: small-caps;
}
span.underline {
text-decoration: underline;
}
div.column {
display: inline-block;
vertical-align: top;
width: 50%;
}
</style>
</head>
<body>
<h1 id="automate-the-boring-stuff-with-python">
Automate the Boring Stuff with Python
</h1>
<blockquote>
<p>
Variables are a fine way to store data while your program is running,
but if you want your data to persist even after your program has
finished, you need to save it to a file. You can think of a file’s
contents as a single string value, potentially gigabytes in size. In
this chapter, you will learn how to use Python to create, read, and save
files on the hard drive.
</p>
</blockquote>
<p>
Variables are a fine way to store data while your program is running, but
if you want your data to persist even after your program has finished, you
need to save it to a file. You can think of a file’s contents as a single
string value, potentially gigabytes in size. In this chapter, you will
learn how to use Python to create, read, and save files on the hard drive.
</p>
<p>
A file has two key properties: a <em>filename</em> (usually written as one
word) and a <em>path</em>. The path specifies the location of a file on
the computer. For example, there is a file on my Windows 7 laptop with the
filename <em>project.docx</em> in the path
<em>C:\Users\asweigart\Documents</em>. The part of the filename after the
last period is called the file’s <em>extension</em> and tells you a file’s
type. <em>project.docx</em> is a Word document, and <em>Users</em>,
<em>asweigart</em>, and <em>Documents</em> all refer to
<em>folders</em> (also called <em>directories</em>). Folders can contain
files and other folders. For example, <em>project.docx</em> is in the
<em>Documents</em> folder, which is inside the <em>asweigart</em> folder,
which is inside the <em>Users</em> folder.
<a
href="#calibre_link-82"
title="Figure 8-1. A file in a hierarchy of folders"
>Figure 8-1</a
>
shows this folder organization.
</p>
<figure>
<img
src="chrome-extension://cjedbglnccaioiolemnfhjncicchinao/images/000027.jpg"
alt="A file in a hierarchy of folders"
/>
<figcaption>A file in a hierarchy of folders</figcaption>
</figure>
<p>Figure 8-1. A file in a hierarchy of folders</p>
<p>
The <em>C:\</em> part of the path is the <em>root folder</em>, which
contains all other folders. On Windows, the root folder is named
<em>C:\</em> and is also called the <em>C: drive</em>. On OS X and Linux,
the root folder is <em>/</em>. In this book, I’ll be using the
Windows-style root folder, <em>C:\</em>. If you are entering the
interactive shell examples on OS X or Linux, enter <code>/</code> instead.
</p>
<p>
Additional <em>volumes</em>, such as a DVD drive or USB thumb drive, will
appear differently on different operating systems. On Windows, they appear
as new, lettered root drives, such as <em>D:\</em> or <em>E:\</em>. On OS
X, they appear as new folders under the <em>/Volumes</em> folder. On
Linux, they appear as new folders under the <em>/mnt</em> (“mount”)
folder. Also note that while folder names and filenames are not case
sensitive on Windows and OS X, they are case sensitive on Linux.
</p>
<h2 id="backslash-on-windows-and-forward-slash-on-os-x-and-linux">
Backslash on Windows and Forward Slash on OS X and Linux
</h2>
<p>
On Windows, paths are written using backslashes (<em>\</em>) as the
separator between folder names. OS X and Linux, however, use the forward
slash (<em>/</em>) as their path separator. If you want your programs to
work on all operating systems, you will have to write your Python scripts
to handle both cases.
</p>
<p>
Fortunately, this is simple to do with the
<code>os.path.join()</code> function. If you pass it the string values of
individual file and folder names in your path,
<code>os.path.join()</code> will return a string with a file path using
the correct path separators. Enter the following into the interactive
shell:
</p>
<p>>>> <strong>import os</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.join(‘usr’, ‘bin’, ‘spam’)</strong> ‘usr\\bin\\spam’
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
I’m running these interactive shell examples on Windows, so
<code>os.path.join('usr', 'bin', 'spam')</code> returned
<code>'usr\\bin\\spam'</code>. (Notice that the backslashes are doubled
because each backslash needs to be escaped by another backslash
character.) If I had called this function on OS X or Linux, the string
would have been <code>'usr/bin/spam'</code>.
</p>
<p>
The os.path.join() function is helpful if you need to create strings for
filenames. These strings will be passed to several of the file-related
functions introduced in this chapter. For example, the following example
joins names from a list of filenames to the end of a folder’s name:
</p>
<p>
>>>
<strong>myFiles = [‘accounts.txt’, ‘details.csv’, ‘invite.docx’]</strong>
</p>
<blockquote>
<blockquote>
<blockquote>
<p><strong>for filename in myFiles:</strong></p>
</blockquote>
</blockquote>
</blockquote>
<pre><code> print(os.path.join('C:\\\\Users\\\\asweigart', filename))</code></pre>
<p>
C:\Users\asweigart\accounts.txt C:\Users\asweigart\details.csv
C:\Users\asweigart\invite.docx
</p>
<h2 id="the-current-working-directory">The Current Working Directory</h2>
<p>
Every program that runs on your computer has a
<em>current working directory</em>, or <em>cwd</em>. Any filenames or
paths that do not begin with the root folder are assumed to be under the
current working directory. You can get the current working directory as a
string value with the <code>os.getcwd()</code> function and change it with
<code>os.chdir()</code>. Enter the following into the interactive shell:
</p>
<p>>>> <strong>import os</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.getcwd()</strong> ‘C:\\Python34’
<strong>os.chdir(‘C:\\Windows\\System32’)</strong> >>>
<strong>os.getcwd()</strong> ‘C:\\Windows\\System32’
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Here, the current working directory is set to <em>C:\Python34</em>, so the
filename <em>project.docx</em> refers to
<em>C:\Python34\project.docx</em>. When we change the current working
directory to <em>C:\Windows</em>, <em>project.docx</em> is interpreted as
<em>C:\ Windows\project.docx</em>.
</p>
<p>
Python will display an error if you try to change to a directory that does
not exist.
</p>
<p>
>>>
<strong>os.chdir(‘C:\\ThisFolderDoesNotExist’)</strong> Traceback (most
recent call last): File “<pyshell#18>”, line 1, in
os.chdir(‘C:\\ThisFolderDoesNotExist’) FileNotFoundError: [WinError 2] The
system cannot find the file specified: ‘C:\\ThisFolderDoesNotExist’
</p>
<h3 id="note">Note</h3>
<p>
<em>While folder is the more modern name for directory, note that</em>
current working directory <em>(or just</em> working directory<em
>) is the standard term, not current working folder.</em
>
</p>
<h2 id="absolute-vs.-relative-paths">Absolute vs. Relative Paths</h2>
<p>There are two ways to specify a file path.</p>
<ul>
<li>
An <em>absolute path</em>, which always begins with the root folder
</li>
<li>
A <em>relative path</em>, which is relative to the program’s current
working directory
</li>
</ul>
<p>
There are also the <em>dot</em> (<code>.</code>) and
<em>dot-dot</em> (<code>..</code>) folders. These are not real folders but
special names that can be used in a path. A single period (“dot”) for a
folder name is shorthand for “this directory.” Two periods (“dot-dot”)
means “the parent folder.”
</p>
<p>
<a
href="#calibre_link-83"
title="Figure 8-2. The relative paths for folders and files in the working directory C:\bacon"
>Figure 8-2</a
>
is an example of some folders and files. When the current working
directory is set to <em>C:\bacon</em>, the relative paths for the other
folders and files are set as they are in the figure.
</p>
<figure>
<img
src="chrome-extension://cjedbglnccaioiolemnfhjncicchinao/images/000032.jpg"
alt="The relative paths for folders and files in the working directory C:"
/>
<figcaption>
The relative paths for folders and files in the working directory C:
</figcaption>
</figure>
<p>
Figure 8-2. The relative paths for folders and files in the working
directory <em>C:\bacon</em>
</p>
<p>
The <em>.\</em> at the start of a relative path is optional. For example,
<em>.\spam.txt</em> and <em>spam.txt</em> refer to the same file.
</p>
<h2 id="creating-new-folders-with-os.makedirs">
Creating New Folders with os.makedirs()
</h2>
<p>
Your programs can create new folders (directories) with the
<code>os.makedirs()</code> function. Enter the following into the
interactive shell:
</p>
<p>>>> <strong>import os</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p><strong>os.makedirs(‘C:\\delicious\\walnut\\waffles’)</strong></p>
</blockquote>
</blockquote>
</blockquote>
<p>
This will create not just the <em>C:\delicious</em> folder but also a
<em>walnut</em> folder inside <em>C:\delicious</em> and a
<em>waffles</em> folder inside <em>C:\delicious\walnut</em>. That is,
<code>os.makedirs()</code> will create any necessary intermediate folders
in order to ensure that the full path exists.
<a
href="#calibre_link-84"
title="Figure 8-3. The result of os.makedirs('C:\delicious \walnut\waffles')"
>Figure 8-3</a
>
shows this hierarchy of folders.
</p>
<figure>
<img
src="chrome-extension://cjedbglnccaioiolemnfhjncicchinao/images/000036.jpg"
alt="The result of os.makedirs(‘C:’)"
/>
<figcaption>The result of os.makedirs(‘C:’)</figcaption>
</figure>
<p>
Figure 8-3. The result of
<code>os.makedirs('C:\\delicious \\walnut\\waffles')</code>
</p>
<p>
The <code>os.path</code> module contains many helpful functions related to
filenames and file paths. For instance, you’ve already used
<code>os.path.join()</code> to build paths in a way that will work on any
operating system. Since <code>os.path</code> is a module inside the
<code>os</code> module, you can import it by simply running
<code>import os</code>. Whenever your programs need to work with files,
folders, or file paths, you can refer to the short examples in this
section. The full documentation for the <code>os.path</code> module is on
the Python website at
<em
><a href="http://docs.python.org/3/library/os.path.html" class="uri"
>http://docs.python.org/3/library/os.path.html</a
></em
>.
</p>
<h3 id="note-1">Note</h3>
<p>
<em>Most of the examples that follow in this section will require the</em>
<code>os</code>
<em
>module, so remember to import it at the beginning of any script you
write and any time you restart IDLE. Otherwise, you’ll get a</em
>
<code>NameError: name 'os' is not defined</code> <em>error message.</em>
</p>
<h2 id="handling-absolute-and-relative-paths">
Handling Absolute and Relative Paths
</h2>
<p>
The <code>os.path</code> module provides functions for returning the
absolute path of a relative path and for checking whether a given path is
an absolute path.
</p>
<ul>
<li>
Calling <code>os.path.abspath(</code><em><code>path</code></em
><code>)</code> will return a string of the absolute path of the
argument. This is an easy way to convert a relative path into an
absolute one.
</li>
<li>
Calling <code>os.path.isabs(</code><em><code>path</code></em
><code>)</code> will return <code>True</code> if the argument is an
absolute path and <code>False</code> if it is a relative path.
</li>
<li>
Calling <code>os.path.relpath(</code><em><code>path, start</code></em
><code>)</code> will return a string of a relative path from the
<em><code>start</code></em> path to <em><code>path</code></em
>. If <em><code>start</code></em> is not provided, the current working
directory is used as the start path.
</li>
</ul>
<p>Try these functions in the interactive shell:</p>
<p>>>> <strong>os.path.abspath(‘.’)</strong> ‘C:\\Python34’</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.abspath(‘.\\Scripts’)</strong>
‘C:\\Python34\\Scripts’ <strong>os.path.isabs(‘.’)</strong> False
<strong>os.path.isabs(os.path.abspath(‘.’))</strong> True
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Since <em>C:\Python34</em> was the working directory when
<code>os.path.abspath()</code> was called, the “single-dot” folder
represents the absolute path <code>'C:\\Python34'</code>.
</p>
<h3 id="note-2">Note</h3>
<p>
<em
>Since your system probably has different files and folders on it than
mine, you won’t be able to follow every example in this chapter exactly.
Still, try to follow along using folders that exist on your
computer.</em
>
</p>
<p>
Enter the following calls to <code>os.path.relpath()</code> into the
interactive shell:
</p>
<p>
>>>
<strong>os.path.relpath(‘C:\\Windows’, ‘C:\\’)</strong> ‘Windows’
</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.relpath(‘C:\\Windows’, ‘C:\\spam\\eggs’)</strong>
‘..\\..\\Windows’ <strong>os.getcwd()</strong> ‘C:\\Python34’
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Calling <code>os.path.dirname(</code><em><code>path</code></em
><code>)</code> will return a string of everything that comes before the
last slash in the <code>path</code> argument. Calling
<code>os.path.basename(</code><em><code>path</code></em
><code>)</code> will return a string of everything that comes after the
last slash in the <code>path</code> argument. The dir name and base name
of a path are outlined in
<a
href="#calibre_link-85"
title="Figure 8-4. The base name follows the last slash in a path and is the same as the filename. The dir name is everything before the last slash."
>Figure 8-4</a
>.
</p>
<figure>
<img
src="chrome-extension://cjedbglnccaioiolemnfhjncicchinao/images/000041.png"
alt="The base name follows the last slash in a path and is the same as the filename. The dir name is everything before the last slash."
/>
<figcaption>
The base name follows the last slash in a path and is the same as the
filename. The dir name is everything before the last slash.
</figcaption>
</figure>
<p>
Figure 8-4. The base name follows the last slash in a path and is the same
as the filename. The dir name is everything before the last slash.
</p>
<p>For example, enter the following into the interactive shell:</p>
<p>
>>> <strong>path = ‘C:\\Windows\\System32\\calc.exe’</strong>
</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.basename(path)</strong> ‘calc.exe’
<strong>os.path.dirname(path)</strong> ‘C:\\Windows\\System32’
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
If you need a path’s dir name and base name together, you can just call
<code>os.path.split()</code> to get a tuple value with these two strings,
like so:
</p>
<p>
>>>
<strong>calcFilePath = ‘C:\\Windows\\System32\\calc.exe’</strong>
</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.split(calcFilePath)</strong>
(‘C:\\Windows\\System32’, ‘calc.exe’)
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Notice that you could create the same tuple by calling
<code>os.path.dirname()</code> and <code>os.path.basename()</code> and
placing their return values in a tuple.
</p>
<p>
>>>
<strong
>(os.path.dirname(calcFilePath), os.path.basename(calcFilePath))</strong
>
(‘C:\\Windows\\System32’, ‘calc.exe’)
</p>
<p>
But <code>os.path.split()</code> is a nice shortcut if you need both
values.
</p>
<p>
Also, note that <code>os.path.split()</code> does <em>not</em> take a file
path and return a list of strings of each folder. For that, use the
<code>split()</code> string method and split on the string in
<code>os.sep</code>. Recall from earlier that the
<code>os.sep</code> variable is set to the correct folder-separating slash
for the computer running the program.
</p>
<p>For example, enter the following into the interactive shell:</p>
<p>
>>> <strong>calcFilePath.split(os.path.sep)</strong> [‘C:’,
‘Windows’, ‘System32’, ‘calc.exe’]
</p>
<p>
On OS X and Linux systems, there will be a blank string at the start of
the returned list:
</p>
<p>
>>> <strong>‘/usr/bin’.split(os.path.sep)</strong> [’‘, ’usr’,
‘bin’]
</p>
<p>
The <code>split()</code> string method will work to return a list of each
part of the path. It will work on any operating system if you pass it
<code>os.path.sep</code>.
</p>
<h2 id="finding-file-sizes-and-folder-contents">
Finding File Sizes and Folder Contents
</h2>
<p>
Once you have ways of handling file paths, you can then start gathering
information about specific files and folders. The
<code>os.path</code> module provides functions for finding the size of a
file in bytes and the files and folders inside a given folder.
</p>
<ul>
<li>
Calling <code>os.path.getsize(</code><em><code>path</code></em
><code>)</code> will return the size in bytes of the file in the
<em><code>path</code></em> argument.
</li>
<li>
Calling <code>os.listdir(</code><em><code>path</code></em
><code>)</code> will return a list of filename strings for each file in
the <em><code>path</code></em> argument. (Note that this function is in
the <code>os</code> module, not <code>os.path</code>.)
</li>
</ul>
<p>
Here’s what I get when I try these functions in the interactive shell:
</p>
<p>
>>>
<strong>os.path.getsize(‘C:\\Windows\\System32\\calc.exe’)</strong> 776192
</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.listdir(‘C:\\Windows\\System32’)</strong> > > >
[‘0409’, ‘12520437.cpx’, ‘12520850.cpx’, ‘5U877.ax’, ‘aaclient.dll’,
–<em>snip</em>– ‘xwtpdui.dll’, ‘xwtpw32.dll’, ‘zh-CN’, ‘zh-HK’,
‘zh-TW’, ‘zipfldr.dll’]
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
As you can see, the <em>calc.exe</em> program on my computer is 776,192
bytes in size, and I have a lot of files in <em>C:\Windows\system32</em>.
If I want to find the total size of all the files in this directory, I can
use <code>os.path.getsize()</code> and <code>os.listdir()</code> together.
</p>
<p>>>> <strong>totalSize = 0</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong
>for filename in os.listdir(‘C:\\Windows\\System32’):</strong
>
</p>
</blockquote>
</blockquote>
</blockquote>
<pre><code> **totalSize = totalSize + os.path.getsize(os.path.join('C:\\\\Windows\\\\System32', filename))**</code></pre>
<blockquote>
<blockquote>
<blockquote>
<p><strong>print(totalSize)</strong> 1117846456</p>
</blockquote>
</blockquote>
</blockquote>
<p>
As I loop over each filename in the <em>C:\Windows\System32</em> folder,
the <code>totalSize</code> variable is incremented by the size of each
file. Notice how when I call <code>os.path.getsize()</code>, I use
<code>os.path.join()</code> to join the folder name with the current
filename. The integer that <code>os.path.getsize()</code> returns is added
to the value of <code>totalSize</code>. After looping through all the
files, I print <code>totalSize</code> to see the total size of the
<em>C:\Windows\System32</em> folder.
</p>
<p>
Many Python functions will crash with an error if you supply them with a
path that does not exist. The <code>os.path</code> module provides
functions to check whether a given path exists and whether it is a file or
folder.
</p>
<ul>
<li>
Calling <code>os.path.exists(</code><em><code>path</code></em
><code>)</code> will return <code>True</code> if the file or folder
referred to in the argument exists and will return <code>False</code> if
it does not exist.
</li>
<li>
Calling <code>os.path.isfile(</code><em><code>path</code></em
><code>)</code> will return <code>True</code> if the path argument
exists and is a file and will return <code>False</code> otherwise.
</li>
<li>
Calling <code>os.path.isdir(</code><em><code>path</code></em
><code>)</code> will return <code>True</code> if the path argument
exists and is a folder and will return <code>False</code> otherwise.
</li>
</ul>
<p>
Here’s what I get when I try these functions in the interactive shell:
</p>
<p>>>> <strong>os.path.exists(‘C:\\Windows’)</strong> True</p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>os.path.exists(‘C:\\some_made_up_folder’)</strong> False
<strong>os.path.isdir(‘C:\\Windows\\System32’)</strong> True
<strong>os.path.isfile(‘C:\\Windows\\System32’)</strong> False
<strong>os.path.isdir(‘C:\\Windows\\System32\\calc.exe’)</strong>
False
<strong>os.path.isfile(‘C:\\Windows\\System32\\calc.exe’)</strong>
True
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
You can determine whether there is a DVD or flash drive currently attached
to the computer by checking for it with the
<code>os.path.exists()</code> function. For instance, if I wanted to check
for a flash drive with the volume named <em>D:\</em> on my Windows
computer, I could do that with the following:
</p>
<p>>>> <strong>os.path.exists(‘D:\\’)</strong> False</p>
<p>Oops! It looks like I forgot to plug in my flash drive.</p>
<p>
Once you are comfortable working with folders and relative paths, you’ll
be able to specify the location of files to read and write. The functions
covered in the next few sections will apply to plaintext files.
<em>Plaintext files</em> contain only basic text characters and do not
include font, size, or color information. Text files with the
<em>.txt</em> extension or Python script files with the
<em>.py</em> extension are examples of plaintext files. These can be
opened with Windows’s Notepad or OS X’s TextEdit application. Your
programs can easily read the contents of plaintext files and treat them as
an ordinary string value.
</p>
<p>
<em>Binary files</em> are all other file types, such as word processing
documents, PDFs, images, spreadsheets, and executable programs. If you
open a binary file in Notepad or TextEdit, it will look like scrambled
nonsense, like in
<a
href="#calibre_link-86"
title="Figure 8-5. The Windows calc.exe program opened in Notepad"
>Figure 8-5</a
>.
</p>
<figure>
<img
src="chrome-extension://cjedbglnccaioiolemnfhjncicchinao/images/000046.jpg"
alt="The Windows calc.exe program opened in Notepad"
/>
<figcaption>The Windows calc.exe program opened in Notepad</figcaption>
</figure>
<p>
Figure 8-5. The Windows <code>calc.exe</code> program opened in Notepad
</p>
<p>
Since every different type of binary file must be handled in its own way,
this book will not go into reading and writing raw binary files directly.
Fortunately, many modules make working with binary files easier—you will
explore one of them, the <code>shelve</code> module, later in this
chapter.
</p>
<p>There are three steps to reading or writing files in Python.</p>
<ol type="1">
<li>
Call the <code>open()</code> function to return a
<code>File</code> object.
</li>
<li>
Call the <code>read()</code> or <code>write()</code> method on the
<code>File</code> object.
</li>
<li>
Close the file by calling the <code>close()</code> method on the
<code>File</code> object.
</li>
</ol>
<h2 id="opening-files-with-the-open-function">
Opening Files with the open() Function
</h2>
<p>
To open a file with the <code>open()</code> function, you pass it a string
path indicating the file you want to open; it can be either an absolute or
relative path. The <code>open()</code> function returns a
<code>File</code> object.
</p>
<p>
Try it by creating a text file named <em>hello.txt</em> using Notepad or
TextEdit. Type <strong><code>Hello world!</code></strong> as the content
of this text file and save it in your user home folder. Then, if you’re
using Windows, enter the following into the interactive shell:
</p>
<p>
>>>
<strong>helloFile = open(‘C:\\Users\\***</strong
>your_home_folder<strong>*</strong>\\hello.txt’)**
</p>
<p>
If you’re using OS X, enter the following into the interactive shell
instead:
</p>
<p>
>>>
<strong>helloFile = open(‘/Users/***</strong
>your_home_folder<strong>*</strong>/hello.txt’)**
</p>
<p>
Make sure to replace <em><code>your_home_folder</code></em> with your
computer username. For example, my username is <em>asweigart</em>, so I’d
enter <code>'C:\\Users\\asweigart\\ hello.txt'</code> on Windows.
</p>
<p>
Both these commands will open the file in “reading plaintext” mode, or
<em>read mode</em> for short. When a file is opened in read mode, Python
lets you only read data from the file; you can’t write or modify it in any
way. Read mode is the default mode for files you open in Python. But if
you don’t want to rely on Python’s defaults, you can explicitly specify
the mode by passing the string value <code>'r'</code> as a second argument
to <code>open()</code>. So
<code>open('/Users/asweigart/ hello.txt', 'r')</code> and
<code>open('/Users/asweigart/hello.txt')</code> do the same thing.
</p>
<p>
The call to <code>open()</code> returns a <code>File</code> object. A
<code>File</code> object represents a file on your computer; it is simply
another type of value in Python, much like the lists and dictionaries
you’re already familiar with. In the previous example, you stored the
<code>File</code> object in the variable <code>helloFile</code>. Now,
whenever you want to read from or write to the file, you can do so by
calling methods on the <code>File</code> object in <code>helloFile</code>.
</p>
<h2 id="reading-the-contents-of-files">Reading the Contents of Files</h2>
<p>
Now that you have a <code>File</code> object, you can start reading from
it. If you want to read the entire contents of a file as a string value,
use the <code>File</code> object’s <code>read()</code> method. Let’s
continue with the <em>hello.txt</em> <code>File</code> object you stored
in <code>helloFile</code>. Enter the following into the interactive shell:
</p>
<p>>>> <strong>helloContent = helloFile.read()</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p><strong>helloContent</strong> ‘Hello world!’</p>
</blockquote>
</blockquote>
</blockquote>
<p>
If you think of the contents of a file as a single large string value, the
<code>read()</code> method returns the string that is stored in the file.
</p>
<p>
Alternatively, you can use the <code>readlines()</code> method to get a
<em>list</em> of string values from the file, one string for each line of
text. For example, create a file named <em>sonnet29.txt</em> in the same
directory as <em>hello.txt</em> and write the following text in it:
</p>
<p>
When, in disgrace with fortune and men’s eyes, I all alone beweep my
outcast state, And trouble deaf heaven with my bootless cries, And look
upon myself and curse my fate,
</p>
<p>
Make sure to separate the four lines with line breaks. Then enter the
following into the interactive shell:
</p>
<p>>>> <strong>sonnetFile = open(‘sonnet29.txt’)</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>sonnetFile.readlines()</strong> > > > [When, in
disgrace with fortune and men’s eyes,\n’, ’ I all alone beweep my
outcast state,\n’, And trouble deaf heaven with my bootless
cries,\n’, And look upon myself and curse my fate,’]
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Note that each of the string values ends with a newline character,
<code>\n</code>, except for the last line of the file. A list of strings
is often easier to work with than a single large string value.
</p>
<p>
Python allows you to write content to a file in a way similar to how the
<code>print()</code> function “writes” strings to the screen. You can’t
write to a file you’ve opened in read mode, though. Instead, you need to
open it in “write plaintext” mode or “append plaintext” mode, or
<em>write mode</em> and <em>append mode</em> for short.
</p>
<p>
Write mode will overwrite the existing file and start from scratch, just
like when you overwrite a variable’s value with a new value. Pass
<code>'w'</code> as the second argument to <code>open()</code> to open the
file in write mode. Append mode, on the other hand, will append text to
the end of the existing file. You can think of this as appending to a list
in a variable, rather than overwriting the variable altogether. Pass
<code>'a'</code> as the second argument to <code>open()</code> to open the
file in append mode.
</p>
<p>
If the filename passed to <code>open()</code> does not exist, both write
and append mode will create a new, blank file. After reading or writing a
file, call the <code>close()</code> method before opening the file again.
</p>
<p>
Let’s put these concepts together. Enter the following into the
interactive shell:
</p>
<p>>>> <strong>baconFile = open(‘bacon.txt’, ‘w’)</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>baconFile.write(‘Hello world!\n’)</strong> 13
<strong>baconFile.close()</strong> >>>
<strong>baconFile = open(‘bacon.txt’, ‘a’)</strong> >>>
<strong>baconFile.write(‘Bacon is not a vegetable.’)</strong> 25
<strong>baconFile.close()</strong> >>>
<strong>baconFile = open(‘bacon.txt’)</strong> >>>
<strong>content = baconFile.read()</strong> >>>
<strong>baconFile.close()</strong> >>>
<strong>print(content)</strong> Hello world! Bacon is not a
vegetable.
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
First, we open <em>bacon.txt</em> in write mode. Since there isn’t a
<em>bacon.txt</em> yet, Python creates one. Calling
<code>write()</code> on the opened file and passing
<code>write()</code> the string argument
<code>'Hello world! /n'</code> writes the string to the file and returns
the number of characters written, including the newline. Then we close the
file.
</p>
<p>
To add text to the existing contents of the file instead of replacing the
string we just wrote, we open the file in append mode. We write
<code>'Bacon is not a vegetable.'</code> to the file and close it.
Finally, to print the file contents to the screen, we open the file in its
default read mode, call <code>read()</code>, store the resulting
<code>File</code> object in <code>content</code>, close the file, and
print <code>content</code>.
</p>
<p>
Note that the <code>write()</code> method does not automatically add a
newline character to the end of the string like the
<code>print()</code> function does. You will have to add this character
yourself.
</p>
<p>
You can save variables in your Python programs to binary shelf files using
the <code>shelve</code> module. This way, your program can restore data to
variables from the hard drive. The <code>shelve</code> module will let you
add Save and Open features to your program. For example, if you ran a
program and entered some configuration settings, you could save those
settings to a shelf file and then have the program load them the next time
it is run.
</p>
<p>Enter the following into the interactive shell:</p>
<p>>>> <strong>import shelve</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>shelfFile = shelve.open(‘mydata’)</strong> >>>
<strong>cats = [‘Zophie’, ‘Pooka’, ‘Simon’]</strong> >>>
<strong>shelfFile[‘cats’] = cats</strong> >>>
<strong>shelfFile.close()</strong>
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
To read and write data using the <code>shelve</code> module, you first
import <code>shelve</code>. Call <code>shelve.open()</code> and pass it a
filename, and then store the returned shelf value in a variable. You can
make changes to the shelf value as if it were a dictionary. When you’re
done, call <code>close()</code> on the shelf value. Here, our shelf value
is stored in <code>shelfFile</code>. We create a list
<code>cats</code> and write <code>shelfFile['cats'] = cats</code> to store
the list in <code>shelfFile</code> as a value associated with the key
<code>'cats'</code> (like in a dictionary). Then we call
<code>close()</code> on <code>shelfFile</code>.
</p>
<p>
After running the previous code on Windows, you will see three new files
in the current working directory: <em>mydata.bak</em>,
<em>mydata.dat</em>, and <em>mydata.dir</em>. On OS X, only a single
<em>mydata.db</em> file will be created.
</p>
<p>
These binary files contain the data you stored in your shelf. The format
of these binary files is not important; you only need to know what the
<code>shelve</code> module does, not how it does it. The module frees you
from worrying about how to store your program’s data to a file.
</p>
<p>
Your programs can use the <code>shelve</code> module to later reopen and
retrieve the data from these shelf files. Shelf values don’t have to be
opened in read or write mode—they can do both once opened. Enter the
following into the interactive shell:
</p>
<p>>>> <strong>shelfFile = shelve.open(‘mydata’)</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>type(shelfFile)</strong> <class
‘shelve.DbfilenameShelf’> <strong>shelfFile[‘cats’]</strong> >
> > [‘Zophie’, ‘Pooka’, ‘Simon’] >>>
<strong>shelfFile.close()</strong>
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Here, we open the shelf files to check that our data was stored correctly.
Entering <code>shelfFile['cats']</code> returns the same list that we
stored earlier, so we know that the list is correctly stored, and we call
<code>close()</code>.
</p>
<p>
Just like dictionaries, shelf values have <code>keys()</code> and
<code>values()</code> methods that will return list-like values of the
keys and values in the shelf. Since these methods return list-like values
instead of true lists, you should pass them to the
<code>list()</code> function to get them in list form. Enter the following
into the interactive shell:
</p>
<p>>>> <strong>shelfFile = shelve.open(‘mydata’)</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong>list(shelfFile.keys())</strong> > > > [‘cats’]
>>> <strong>list(shelfFile.values())</strong> > >
> [[‘Zophie’, ‘Pooka’, ‘Simon’]] >>>
<strong>shelfFile.close()</strong>
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Plaintext is useful for creating files that you’ll read in a text editor
such as Notepad or TextEdit, but if you want to save data from your Python
programs, use the <code>shelve</code> module.
</p>
<p>
Recall from
<a href="#calibre_link-87" title="Pretty Printing">Pretty Printing</a>
that the <code>pprint.pprint()</code> function will “pretty print” the
contents of a list or dictionary to the screen, while the
<code>pprint.pformat()</code> function will return this same text as a
string instead of printing it. Not only is this string formatted to be
easy to read, but it is also syntactically correct Python code. Say you
have a dictionary stored in a variable and you want to save this variable
and its contents for future use. Using <code>pprint.pformat()</code> will
give you a string that you can write to <em>.py</em> file. This file will
be your very own module that you can import whenever you want to use the
variable stored in it.
</p>
<p>For example, enter the following into the interactive shell:</p>
<p>>>> <strong>import pprint</strong></p>
<blockquote>
<blockquote>
<blockquote>
<p>
<strong
>cats = [{‘name’: ‘Zophie’, ‘desc’: ‘chubby’}, {‘name’: ‘Pooka’,
‘desc’: ‘fluffy’}]</strong
>
>>> <strong>pprint.pformat(cats)</strong> “[{‘desc’:
‘chubby’, ‘name’: ‘Zophie’}, {‘desc’: ‘fluffy’, ‘name’: ‘Pooka’}]”
<strong>fileObj = open(‘myCats.py’, ‘w’)</strong> >>>
<strong
>fileObj.write(‘cats =’ + pprint.pformat(cats) + ‘\n’)</strong
>
83 <strong>fileObj.close()</strong>
</p>
</blockquote>
</blockquote>
</blockquote>
<p>
Here, we import <code>pprint</code> to let us use
<code>pprint.pformat()</code>. We have a list of dictionaries, stored in a
variable <code>cats</code>. To keep the list in
<code>cats</code> available even after we close the shell, we use
<code>pprint.pformat()</code> to return it as a string. Once we have the
data in <code>cats</code> as a string, it’s easy to write the string to a
file, which we’ll call <em>myCats.py</em>.
</p>
<p>
The modules that an <code>import</code> statement imports are themselves
just Python scripts. When the string from <code>pprint.pformat()</code> is
saved to a <em>.py</em> file, the file is a module that can be imported
just like any other.
</p>
<p>
And since Python scripts are themselves just text files with the
<em>.py</em> file extension, your Python programs can even generate other
Python programs. You can then import these files into scripts.
</p>
<p>>>> <strong>import myCats</strong></p>
<blockquote>
<blockquote>