Introducing httplib2
Before you can use httplib2
, you’ll need to install it. Visit code.google.com/p/httplib2/ and download the latest version. httplib2
is available for Python 2.x and Python 3.x; make sure you get the Python 3 version, named something like httplib2-python3-0.5.0.zip
.
Unzip the archive, open a terminal window, and go to the newly created httplib2
directory. On Windows, open the Start
menu, select Run
…, type cmd.exe
and press ENTER
.
c:\Users\pilgrim\Downloads> dirVolume in drive C has no label.Volume Serial Number is DED5-B4F8Directory of c:\Users\pilgrim\Downloads07/28/2009 12:36 PM <DIR> .07/28/2009 12:36 PM <DIR> ..07/28/2009 12:36 PM <DIR> httplib2-python3-0.5.007/28/2009 12:33 PM 18,997 httplib2-python3-0.5.0.zip1 File(s) 18,997 bytes3 Dir(s) 61,496,684,544 bytes freec:\Users\pilgrim\Downloads> cd httplib2-python3-0.5.0c:\Users\pilgrim\Downloads\httplib2-python3-0.5.0> c:\python31\python.exe setup.py installrunning installrunning buildrunning build_pyrunning install_libcreating c:\python31\Lib\site-packages\httplib2copying build\lib\httplib2\iri2uri.py -> c:\python31\Lib\site-packages\httplib2copying build\lib\httplib2\__init__.py -> c:\python31\Lib\site-packages\httplib2byte-compiling c:\python31\Lib\site-packages\httplib2\iri2uri.py to iri2uri.pycbyte-compiling c:\python31\Lib\site-packages\httplib2\__init__.py to __init__.pycrunning install_egg_infoWriting c:\python31\Lib\site-packages\httplib2-python3_0.5.0-py3.1.egg-info
On Mac OS X, run the Terminal.app
application in your /Applications/Utilities/
folder. On Linux, run the Terminal
application, which is usually in your Applications
menu under Accessories
or System
.
you@localhost:~/Desktop$ unzip httplib2-python3-0.5.0.zipArchive: httplib2-python3-0.5.0.zipinflating: httplib2-python3-0.5.0/READMEinflating: httplib2-python3-0.5.0/setup.pyinflating: httplib2-python3-0.5.0/PKG-INFOinflating: httplib2-python3-0.5.0/httplib2/__init__.pyinflating: httplib2-python3-0.5.0/httplib2/iri2uri.pyyou@localhost:~/Desktop$ cd httplib2-python3-0.5.0/you@localhost:~/Desktop/httplib2-python3-0.5.0$ sudo python3 setup.py installrunning installrunning buildrunning build_pycreating buildcreating build/lib.linux-x86_64-3.1creating build/lib.linux-x86_64-3.1/httplib2copying httplib2/iri2uri.py -> build/lib.linux-x86_64-3.1/httplib2copying httplib2/__init__.py -> build/lib.linux-x86_64-3.1/httplib2running install_libcreating /usr/local/lib/python3.1/dist-packages/httplib2copying build/lib.linux-x86_64-3.1/httplib2/iri2uri.py -> /usr/local/lib/python3.1/dist-packages/httplib2copying build/lib.linux-x86_64-3.1/httplib2/__init__.py -> /usr/local/lib/python3.1/dist-packages/httplib2byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/iri2uri.py to iri2uri.pycbyte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/__init__.py to __init__.pycrunning install_egg_infoWriting /usr/local/lib/python3.1/dist-packages/httplib2-python3_0.5.0.egg-info
To use httplib2
, create an instance of the httplib2.Http
class.
import httplib2h = httplib2.Http('.cache') #①response, content = h.request('http://diveintopython3.org/examples/feed.xml') #②print (response.status) #③#200print (content[:52]) #④#b"<?xml version='1.0' encoding='utf-8'?>\r\n<feed xmlns="print (len(content))#303
① The primary interface to httplib2
is the Http
object. For reasons you’ll see in the next section, you should always pass a directory name when you create an Http
object. The directory does not need to exist; httplib2
will create it if necessary.
② Once you have an Http
object, retrieving data is as simple as calling the request()
method with the address of the data you want. This will issue an HTTP GET
request for that URL. (Later in this chapter, you’ll see how to issue other HTTP requests, like POST
.)
③ The request()
method returns two values. The first is an httplib2.Response
object, which contains all the HTTP headers the server returned. For example, a status
code of 200
indicates that the request was successful.
④ The content
variable contains the actual data that was returned by the HTTP server. The data is returned as a bytes
object, not a string. If you want it as a string, you’ll need to determine the character encoding and convert it yourself.
You probably only need one httplib2.Http object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. “I need to request data from two different URLs” is not a valid reason. Re-use the Http object and just call the request() method twice.
A short digression to explain why httplib2 returns bytes instead of strings
Bytes. Strings. What a pain. Why can’t httplib2
“just” do the conversion for you? Well, it’s complicated, because the rules for determining the character encoding are specific to what kind of resource you’re requesting. How could httplib2
know what kind of resource you’re requesting? It’s usually listed in the Content-Type
HTTP header, but that’s an optional feature of HTTP and not all HTTP servers include it. If that header is not included in the HTTP response, it’s left up to the client to guess. (This is commonly called “content sniffing,” and it’s never perfect.)
If you know what sort of resource you’re expecting (an XML document in this case), perhaps you could “just” pass the returned bytes object to the xml.etree.ElementTree.parse() function. That’ll work as long as the XML document includes information on its own character encoding (as this one does), but that’s an optional feature and not all XML documents do that. If an XML document doesn’t include encoding information, the client is supposed to look at the enclosing transport — i.e. the Content-Type http header, which can include a charset parameter.
But it’s worse than that. Now character ...