Showing posts with label Task. Show all posts
Showing posts with label Task. Show all posts

urllib issues Roundup

I had a spreadsheet listing down the bugs in the urllib. I could not use it effectively as much as I wished to. Decided to list down the bugs in the blog itself so that I stay on top of things TODO.


Feature Requests:


Bugs:


Following are quick fixes as per my analysis.



Following will take days time.



Low priority.

  • http://bugs.python.org/issue1285086
    urllib.quote is too slow

Fixed Issues: Yet to be closed

urllib and NTLM Authentication?

I dont think it is my list of bug fixes. But got to look into this topic as it
was a required thing when developing certain apps at Office. Yesterday, one of
my friend recollected about it also.

urllib package

The First betas of Python 3.0 and Python 2.6 were scheduled for release on Jun
11, but now it is postponed to June 18th.

There is a TODO Task of packaging urllib and it comes under my GSOC task as
well. The Bug report had another developer assigned to it and I have informed
that I would give it a try.

The Standard Library Reorganization follows the PEP3108, most of the other
things are done. So, things are set as such.

If I follow the example of httplib Reorganization, the following has already
taken effect.

Python 2.5 || Python 3.0/Python 2.6

http
httplib ------- http.client ( client.py)
BaseHTTPServer ------- http.server ( server.py)
CGIHTTPServer ------- http.server ( server.py)
SimpleHTTPServer ------ http.server ( server.py)
(No Naming conflicts should occur)
Cookies ------- http.cookies( cookies.py)
cookielib ------- http.cookiejar

The similar reorganization is designed for urllib and this will be my TODO
task.
>From PEP 3108.

urllib2 -------- urllib.request ( request.py)
urlparse -------- urllib.parse ( parse.py)
urllib -------- urllib.parse, urllib.request

The current urllib module will be split into parse.py and request.py
- quoting related functionalies will be added to parse.py
- URLOpener and FancyUrlOpener will be added to request.py

Other activities should include:

- Docs need to be updated.
- Tests needs to be ensured to run properly.
- No conflicts should occur.
- Python 3.0 - Testing needs to be done.
- Changes to other modules.

I shall set internal Target of, June 16 with 4 hours per day for this task
exclusively.

urlparse and port number

Bugs #2195 and #754016 both complain about urlparse not handling port number properly and often giving error nous results with respect to scheme, netloc and path.

Yes, it misbehaves under circumstances when you do not start the netloc with //. But in all practical purposes when we use url without scheme, we do plainly say the netloc part, like www.python.org.

Requires fix and the following patch will do that.


@@ -143,7 +143,7 @@ def urlsplit(url, scheme='', allow_fragm
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()
netloc = query = fragment = ''
- i = url.find(':')
+ i = url.find('://')
if i > 0:
if url[:i] == 'http': # optimize the common case
scheme = url[:i].lower()
@@ -164,6 +164,9 @@ def urlsplit(url, scheme='', allow_fragm
scheme, url = url[:i].lower(), url[i+1:]
if scheme in uses_netloc and url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
+ else:
+ netloc, url = _splitnetloc(url)
+
if allow_fragments and scheme in uses_fragment and '#' in url:
url, fragment = url.split('#', 1)
if scheme in uses_query and '?' in url:

1) First change for differentiating between the port's(:) and scheme's (:)//.
2) Second change when the scheme is not given, just split into netloc and rest of url.

Got to write the tests for it and submit it.

One general review comment is urlparse.urlsplit is written in a not very composed/collected way. There have been lot of realizations (just like the one above), then then patches/additions to fix it.
So we see a special condition for http being handled in a block of code.
Those can be cleaned up.