Skip to content
This repository has been archived by the owner on Dec 24, 2019. It is now read-only.

Bugfix/issue 37 fix with non ascii #39

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ePaul
Copy link

@ePaul ePaul commented Aug 2, 2015

This is a try to fix #37.

I guess it needs some more documentation and some tests, but before I do that, I wanted to show what way I did go. (So please don't merge this yet. I also need to add a better commit comment.)

The problem was that some fix functions had dst.write(fixed.decode()) and later fix_file had fd.write(fixed.encode()), which use the system's default encoding, which in my system seems to be ASCII. Also, this way we got possibly multiple decode steps (one for each problem), and if we have after one of the decode steps a Jalopy run or similar, it gets even more unclear what's happening.

Basically the idea of my fix is that for each file-type, there is one rule which indicates the encoding. It is named after one of the encodings supported by the codecs package. (In the default configuration, this is just utf-8 (for most file types) and ascii (for .properties), there should be some test-cases for latin-1/iso-8859-1, at least, too.) (The function get_encoding_rule(rules) simply iterates over the rule names and passes each one to codecs.lookup, until one of them is successful. For this reason I needed to reorder the rules for the *.properties entry in the default config.)

That encoding is then (other than for the check that one actually can decode the file using this encoding) used for those fix functions which actually work on the content of the file by themselves and don't just call external programs (like Jalopy, pythontidy, etc.). For those methods (for now just _fix_notabs, _fix_nocr and _fix_notrailingws) I invented a decorator @needs_unicode, which wraps the function call into codecs.EncodedFile.

I needed to store the used encoding in a global dict ENCODING_BY_FILE, analogously to the VALIDATION_ERRORS list – please propose a better way to do this, if this is not the way to go.

I noted that for the added functionality it would be useful to have some automated tests (i.e. a bundle of config file, command line arguments, input file and expected output/expected fixed file). Is there some standard python way of doing this?

@ePaul ePaul force-pushed the bugfix/issue-37-fix-with-non-ascii branch from 3f0a770 to 6225e63 Compare August 2, 2015 20:51
This file can be checked with codevalidator (complains about
trailing white space, use of tabs, use of carriage return),
but with the current code it can't be fixed (will be overwritten
with an empty file).

hjacobs#37
@ePaul ePaul force-pushed the bugfix/issue-37-fix-with-non-ascii branch from 6225e63 to 31a2fcc Compare August 5, 2015 20:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: codevalidator -f crashes and leaves my file empty when using non-ASCII characters
1 participant