Re: [issue2275] urllib2 header capitalization

I am submitting a revised patch for this issue.
I did some analysis on the history of this issue and found that this
.capitalize() vs .title() changes had come up earlier too (
issue1542948)and decision was taken to:
- To retain the Header format in .capitalize() to maintain backward
compatibility.
- However, when the headers are passed to httplib, they are converted to
.title() format ( see AbstractHTTPHandler method )
- It is encouraged that users uses .add_header(), .has_header(),
.get_header() methods to check for headers instead of using the .headers
dict directly (which will still remain undocumented interface).

Note to Hans-Peter would be: Changing the headers to .title() tends to
make the .header_items() retrieval backward incompatible, so the headers
will still be stored in .capitalize() format.

And I have made the following changes to the patch:
1) Support for case insensitive dict look up which will work with for
.has_header, .get_header(). So when .has_header("User-Agent") will
return True even when .headers give {"User-agent":"blah"}
2) Added tests to tests the behavior.
3) Changes to doc to reflect upon this issue.

Btw, the undocumented .headers interface will also support
case-insensitive lookup, so I have added tests for that too.

Let me know if you have any comments. Lets plan to close this issue.

Thanks,

_______________________________________
<http://bugs.python.org/issue2275>
_______________________________________

Update and [issue2275]

Its been sometime since I posted my progress. Well, I traveled out of town for
a weekend, and then I could get back into groove immediately. Just realized
that its been more than a week.
Things will be much faster now and I hope not to get into unplanned travel
schedules.

Okay, coming back. I started working on issue2275, which is causing much
debate.

With the discussion, I realized that there is an "difference in opinion" in
fixing the bug. I had assumed that Headers dictionary should be
"User-Agent"="Mozilla Form", and the currently it is in "User-agent" ="Mozilla
form".

For Backward compatiblity purposes, looks like we will have to maintain
capitalize() form and then provide the title() cases to other methods and also
implement the .headers methods.

After much thought into this discussion and reading some of the articles, I
come to think that.

Apart from the current functionality of the headers.

1) .headers public interface.
2) get_header method returning titled()
3) get_header items method returning titled()

Would be desirable.

I referenced Python 2.3.8 Library Docs and found that those methods were not
there and has been implemented from Python 2.4 only.

So, I looking into those two older releases to see where this is change surface
in and fix things specific to that change, so that older code does not break
in.

> John J Lee <jjlee@users.sourceforge.net> added the comment:
>
> > With respect to point 1), I assume that we all agree upon that headers
> > should stored in Titled-Format instead of Capitalized-format.
>
> I would probably choose to store the headers in Capitalized-form,
> because that makes implementing .headers trivial.
>
> [...]
> > Now, if we go for a Case Normalization at the much later stage, will the
> > headers be stored still in capitalize() format? ( In that case, this bug
> > requests it be stored in .titled() format confirming to many practices)
> > Would you like to explain a bit more on that?
>
> Implement .get_header() and friends using .headers, along the lines of:
>
> def get_header(self, header_name, default=None):
> return self.headers.get(
> header_name,
> self.unredirected_hdrs.get(header_name, default)).title()
>
> And then ensure that the headers actually passed to httplib also get
> .title()-cased. This also has the benefit, compared with your patch, of
> leaving the behaviour of non-HTTP URL schemes unchanged.
>

[issue2275] urllib2 header capitalization

Added file: http://bugs.python.org/file10849/issue2275-py26.diff

- Included a CaseInsensitiveDict Lookup for Headers interface.
- Headers will now be .title()-ed instead of .capitalized() ed.
- Included Tests for the changes made.

http://bugs.python.org/issue2275

To Study:

- Difference between in directed header and unredirected header in the HTTP
implementation.
- Difference between gethostname and gethostbyname and if gethostname in
(Unix/Windows/Mac) support IPv6 addresses. This is used directly by _socket.c
in Python.
- RFC 2396! - Atleast 3 bugs reference this and url parsing libraries needs to
be upgraded and be conformant with RFC 2396.

--
O.R.Senthil Kumaran
http://uthcode.sarovar.org

[issue3094] By default, HTTPSConnection should send header "Host: somehost" instead of "Host: somehost:443"

http://bugs.python.org/issue3094

I had commented on the bug and patch. Good that it is fixed now.
_______________________________________

Gregory P. Smith added the comment:

fixed in trunk r64771.

(and indeed the previous behavior was buggy in the extreemly rare event that
someone ran a https server on port 80 the :80 should have been supplied).

Caseinsensitive Dict lookup


class CaseInsensitiveDict(dict):
def __init__(self, *args, **kwargs):
self.keystore = {}
d = dict(*args, **kwargs)
for k in d.keys():
self.keystore[self._get_lower(k)] = k
return super(CaseInsensitiveDict, self).__init__(*args, **kwargs)
def __setitem__(self, k, v):
if hasattr(self,'keystore'):
self.keystore[self._get_lower(k)] = k
return super(CaseInsensitiveDict, self).__setitem__(k, v)
def __getitem__(self, k):
if hasattr(self,'keystore') and self._get_lower(k) in self.keystore:
k = self.keystore[self._get_lower(k)]
return super(CaseInsensitiveDict, self).__getitem__(k)

@staticmethod
def _get_lower(k):
if isinstance(k, str):
return k.lower()
else:
return k



This one seems to do the trick at last.

This is based on the logic that for every key in the dictionary (you create for
you want a Caseinsensitive Dict lookup), store the key in the lower value in
the keystore.

When you retreive a item from the Dict, let your request be in any case, but
lower it and then lookup the actual key as it was stored in the keystore and
then retrieve the value using that key.

As the __init__, __setitem__, __getitem__ methods use the super() call to dict,
and using the *same* key and the value passed to the normal dictionary, this
class with an internal keystore would behave as unsuspecting as possible.

Devised this way after a good number of trials.
Lesson to myself:

Study the concepts, try the code before you look for examples in the web,
otherwise you tend to get influenced by examples. Sometimes it might be
helpful, that would only serve as learning. You might want to get back to
implement in the wayyou best understand.

During this patch workout, got to know mixins, decorators, staticmethod,
classmethod, dict more.

There are still couple of tests failing with
Issue2275(http://bugs.python.org/issue2275) , hopefully I would have ironed
them out by today.

sudo write in vim

After editing a file, you discover that its mode won't allow you to save.
Then you do:

:w !sudo tee % > /dev/null

release schedules from pep-0361

Jul 15 2008: Python 2.6b2 and 3.0b2 planned
Aug 23 2008: Python 2.6b3 and 3.0b3 planned
Sep 03 2008: Python 2.6rc1 and 3.0rc1 planned
Sep 17 2008: Python 2.6rc2 and 3.0rc2 planned
Oct 01 2008: Python 2.6 and 3.0 final planned
>

svn merge

All the changes with respect to urllib for py3k were made in py3k-urllib branch
before merging/checking in.
I wanted to fix the py3k related bugs in the same branch so that merge will be
easier later. In order to bring the branch up-to-date with the trunk code, I
had to merge the changes from the trunk into the py3k-urllib branch.

Looked into the svn merge and found out the way to do it.

- svn merge command compares two trees and applies the difference to <b>a working
copy.</b>
- Syntax_to_remember:
<pre class="prettyprint">
svn merge <destination_url_to_merge_to> <source_url_to_merge_from>
<working_copy>
</pre>
In my case it was:
<pre class="prettyprint">
svn merge svn+ssh://pythondev@svn.python.org/python/branches/py3k-urllib
svn+ssh://pythondev@svn.python.org/python/branches/py3k .
</pre>

- Before this, I had to svn update my working copy also.
- svn merge, changes your working copy, so to update your branch you have do
svn ci.

So in effect,
1) Go to your working copy.
2) svn update.
3) svn merge.
4) svn commit.


--
O.R.Senthil Kumaran
http://uthcode.sarovar.org

working on Issue2275

Currently working on issue2275
 
import UserDict
class CaseInsensitiveDict(dict, UserDict.DictMixin):
def __init__(self, *args, **kw):
self.orig = {}
super(CaseInsensitiveDict, self).__init__(*args, **kw)
def items(self):
keys = dict.keys(self)
values = dict.values(self)
return dict((self.orig[k],v) for k in keys for v in values)
def __setitem__(self, k, v):
hash_val = hash(k.lower())
self.orig[hash_val] = k
dict.__setitem__(self, hash_val, v)
def __getitem__(self, k):
return dict.__getitem__(self, hash(k.lower()))


somedict = CaseInsensitiveDict()
print somedict
somedict['Blah'] = "Boo"
somedict['blah'] = "Boo1"
print somedict['BLAH']
print somedict
print somedict.items()


This can be used for creating a case insensitive dictionary.
But there are tests failing in urllib2 if I use it directly. I think more methods than just items() need to be overridden for usage.