Bytes and String in Py3k

Martin's Explaination:

It's really very similar to 2.x: the "bytes" type is to used in all
interfaces that operate on byte sequences that may or may not represent characters; in particular, for interface where the operating system deliberately uses bytes - ie. low-level file IO and socket IO; also for cases where the encoding is embedded in the stream that still needs to be processed (e.g. XML parsing).

(Unicode) strings should be used where the data is truly text by
nature, i.e. where no encoding information is necessary to find out
what characters are intended. It's used on interfaces where the
encoding is known (e.g. text IO, where the encoding is specified
on opening, XML parser results, with the declared encoding, and
GUI libraries, which naturally expect text).

- base64.encodestring expects bytes (naturally, since it is supposed to
encode arbitrary binary data), and produces bytes (debatably)
- binascii.b2a_hex likewise (expect and produce bytes)
- pickle.dumps produces bytes (uniformly, both for binary and text
pickles)
- marshal.dumps likewise
- email.message.Message().as_string produces a (unicode) string
(see Barry's recent thread on whether that's a good thing; the
email package hasn't been fully ported to 3k, either)
- the XML libraries (continue to) parse bytes, and produce
Unicode strings
- for the IO libraries, see above

No comments: