...

Introducing httplib2

We'll cover the following...

A short digression to explain why httplib2 returns bytes instead of strings
How httplib2 handles caching
How httplib2 handles Last-Modified and ETag headers
How http2lib handles compression
How httplib2 handles redirects

Press + to interact

c:\Users\pilgrim\Downloads> dir
 Volume in drive C has no label.
 Volume Serial Number is DED5-B4F8
 Directory of c:\Users\pilgrim\Downloads
07/28/2009  12:36 PM    <DIR>          .
07/28/2009  12:36 PM    <DIR>          ..
07/28/2009  12:36 PM    <DIR>          httplib2-python3-0.5.0
07/28/2009  12:33 PM            18,997 httplib2-python3-0.5.0.zip
               1 File(s)         18,997 bytes
               3 Dir(s)  61,496,684,544 bytes free
c:\Users\pilgrim\Downloads> cd httplib2-python3-0.5.0
c:\Users\pilgrim\Downloads\httplib2-python3-0.5.0> c:\python31\python.exe setup.py install
running install
running build
running build_py
running install_lib
creating c:\python31\Lib\site-packages\httplib2
copying build\lib\httplib2\iri2uri.py -> c:\python31\Lib\site-packages\httplib2
copying build\lib\httplib2\__init__.py -> c:\python31\Lib\site-packages\httplib2
byte-compiling c:\python31\Lib\site-packages\httplib2\iri2uri.py to iri2uri.pyc
byte-compiling c:\python31\Lib\site-packages\httplib2\__init__.py to __init__.pyc
running install_egg_info
Writing c:\python31\Lib\site-packages\httplib2-python3_0.5.0-py3.1.egg-info

Press + to interact

you@localhost:~/Desktop$ unzip httplib2-python3-0.5.0.zip
Archive:  httplib2-python3-0.5.0.zip
  inflating: httplib2-python3-0.5.0/README
  inflating: httplib2-python3-0.5.0/setup.py
  inflating: httplib2-python3-0.5.0/PKG-INFO
  inflating: httplib2-python3-0.5.0/httplib2/__init__.py
  inflating: httplib2-python3-0.5.0/httplib2/iri2uri.py
you@localhost:~/Desktop$ cd httplib2-python3-0.5.0/
you@localhost:~/Desktop/httplib2-python3-0.5.0$ sudo python3 setup.py install
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.1
creating build/lib.linux-x86_64-3.1/httplib2
copying httplib2/iri2uri.py -> build/lib.linux-x86_64-3.1/httplib2
copying httplib2/__init__.py -> build/lib.linux-x86_64-3.1/httplib2
running install_lib
creating /usr/local/lib/python3.1/dist-packages/httplib2
copying build/lib.linux-x86_64-3.1/httplib2/iri2uri.py -> /usr/local/lib/python3.1/dist-packages/httplib2
copying build/lib.linux-x86_64-3.1/httplib2/__init__.py -> /usr/local/lib/python3.1/dist-packages/httplib2
byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/iri2uri.py to iri2uri.pyc
byte-compiling /usr/local/lib/python3.1/dist-packages/httplib2/__init__.py to __init__.pyc
running install_egg_info
Writing /usr/local/lib/python3.1/dist-packages/httplib2-python3_0.5.0.egg-info

Press + to interact

① The primary interface to httplib2 is the Http object. For reasons you’ll see in the next section, you should always pass a directory name when you create an Http object. The directory does not need to exist; httplib2 will create it if necessary.

② Once you have an Http object, retrieving data is as simple as calling the request() method with the address of the data you want. This will issue an HTTP GET request for that URL. (Later in this chapter, you’ll see how to issue other HTTP requests, like POST.)

③ The request() method returns two values. The first is an httplib2.Response object, which contains all the HTTP headers the server returned. For example, a status code of 200 indicates that the request was successful.

④ The content variable contains the actual data that was returned by the HTTP server. The data is returned as a bytes object, not a string. If you want it as a string, you’ll need to determine the character encoding and convert it yourself.

You probably only need one httplib2.Http object. There are valid reasons for creating more than one, but you should only do so if you know why you need them. “I need to request data from two different URLs” is not a valid reason. Re-use the Http object and just call the request() method twice.

A short digression to explain why httplib2 returns bytes instead of strings

Bytes. Strings. What a pain. Why can’t httplib2 “just” do the conversion for you? Well, it’s complicated, because the rules for determining the character encoding are specific to what kind of resource you’re requesting. How could httplib2 know what kind of resource you’re requesting? It’s usually listed in the Content-Type HTTP header, but that’s an optional feature of HTTP and not all HTTP servers include it. If that header is not included in the HTTP response, it’s left up to the client to guess. (This is commonly called “content sniffing,” and it’s never perfect.)

If you know what sort of resource you’re expecting (an XML document in this case), perhaps you could “just” pass the returned bytes object to the xml.etree.ElementTree.parse() function. That’ll work as long as the XML document includes information on its own character encoding (as this one does), but that’s an optional feature and not all XML documents do that. If an XML document doesn’t include encoding information, the client is supposed to look at the enclosing transport — i.e. the Content-Type http header, which can include a charset parameter.

But it’s worse than that. Now character encoding information can be in two places: within the XML document itself, and within the Content-Type HTTP header. If the information is in both places, which one wins? According to RFC 3023 (I swear I am not making this up), if the media type given in the ...

Your First Python Program

Native Datatypes

Comprehensions

Strings

Regular Expressions

Closures & Generators

Classes & Iterators

Advanced Iterators

Unit Testing

Refactoring

Files

XML

Serializing Python Objects

HTTP Web Services

Case Study: Porting chardet to Python 3

Packaging Python Libraries

Appendix : Where To Go From Here

Introducing httplib2

A short digression to explain why httplib2 returns bytes instead of strings