[ cegcc-Bugs-2912803 ] fopen() fails with Japanese filenames - encoding mismatch

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ cegcc-Bugs-2912803 ] fopen() fails with Japanese filenames - encoding mismatch

SourceForge.net
Bugs item #2912803, was opened at 2009-12-11 18:35
Message generated for change (Settings changed) made by dannybackx
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: CeGCC (arm-wince-cegcc)
Group: None
Status: Open
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Danny Backx (dannybackx)
Summary: fopen() fails with Japanese filenames - encoding mismatch

Initial Comment:
I've been trying to open files with Japanese characters in the filename using arm-wince-cegcc, v0.55.
I've recompiled with --enable-newlib-mb to enable multi-byte support. I've succeeded eventually but have had to fix a 'bug' in the newlib library,
however while I can make a simplistic patch up I need help on a proper fix.

I'm using filenames in UTF-8, I've called setlocal(C_TYPE,"C-UTF-8") which succeeds.

The problem seemed to occur in libc/sys/wince/cefixpath.c in the function XCEFixPathA(), which is called by fixpath().
Here's an extract for XCEFixPathA().

  MultiByteToWideChar(CP_ACP, 0, pathin, -1, wpathin, MAX_PATH);

  XCEFixPathW(wpathin, wpathout);

  WideCharToMultiByte(CP_ACP, 0,
              wpathout, -1,
              pathout, MAX_PATH,
              NULL, NULL);

It seems that the codepage CP_ACP (Windows ANSI default) can conflict with my codepage as set by setlocale(), because different multi-byte to wide-char functions are used in cefixpath.c and io.c (mbstowcs() in the function _open_r which is called by fopen). This conflict causes my UTF-8 string to get mangled up by the conversion to and from multi-byte chars in XCEFixPath().

My temporary fix has been to replace the code in XCEFixPath() with a simple / to \ replacement on an 8-bit string. Obviously this only works on ASCII or UTF-8 strings.

I include my sample source code along with trace and log output from this program compiled with a patched and unpatched version of newlib.
Can somebody please take a look and advise me on a better fix to this problem please?

----------------------------------------------------------------------

>Comment By: Danny Backx (dannybackx)
Date: 2010-01-02 07:05

Message:
Please check out the fix I just checked in, it worked for me

----------------------------------------------------------------------

Comment By: Adrian Skilling (adrianskilling)
Date: 2009-12-15 15:33

Message:
Sorry. This can't work since MultiByteToWideChar cannot accept a string for
the locale, it only accepts a small limited set of code pages such as
CP_ACP, CP_UTF7 and CP_UTF8. MultiByteToWideChar has an advantage that it
can be given a code page but this advantage is not used because the code
page is fixed to CP_ACP.

I suggest that MultiByteToWideChar() is replaced with mbstowcs() which
would then make it consistent with that used in fopen() [in _open_r()
specifically). I shall try this on my version. But I can't be sure it would
work well for all languages. I'll get back.

----------------------------------------------------------------------

Comment By: Adrian Skilling (adrianskilling)
Date: 2009-12-15 12:10

Message:
Sorry. This can't work since MultiByteToWideChar cannot accept a string for
the locale, it only accepts a small limited set of code pages such as
CP_ACP, CP_UTF7 and CP_UTF8. MultiByteToWideChar has an advantage that it
can be given a code page but this advantage is not used because the code
page is fixed to CP_ACP.

I suggest that MultiByteToWideChar() is replaced with mbstowcs() which
would then make it consistent with that used in fopen() [in _open_r()
specifically). I shall try this on my version. But I can't be sure it would
work well for all languages. I'll get back.

----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2009-12-12 07:35

Message:
A trick I've seen used to figure out the locale is
 int xx = setlocale("C", LC_ALL);
 (void) setlocale(xx, LC_ALL);

The first call sets locale to "C" but also tells you what it was, the
second call restores.
You can do this to figure out the locale in XCEFixPathA, and use xx
instead of CP_ACP.
Would that fix your problem ?

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Cegcc-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/cegcc-devel