[ cegcc-Bugs (use Trac instead)-2912803 ] fopen() fails with Japanese filenames - encoding mismatch

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ cegcc-Bugs (use Trac instead)-2912803 ] fopen() fails with Japanese filenames - encoding mismatch

SourceForge.net
Bugs (use Trac instead) item #2912803, was opened at 2009-12-11 09:35
Message generated for change (Comment added) made by pfalcon
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CeGCC (arm-wince-cegcc)
Group: None
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Danny Backx (dannybackx)
Summary: fopen() fails with Japanese filenames - encoding mismatch

Initial Comment:
I've been trying to open files with Japanese characters in the filename using arm-wince-cegcc, v0.55.
I've recompiled with --enable-newlib-mb to enable multi-byte support. I've succeeded eventually but have had to fix a 'bug' in the newlib library,
however while I can make a simplistic patch up I need help on a proper fix.

I'm using filenames in UTF-8, I've called setlocal(C_TYPE,"C-UTF-8") which succeeds.

The problem seemed to occur in libc/sys/wince/cefixpath.c in the function XCEFixPathA(), which is called by fixpath().
Here's an extract for XCEFixPathA().

  MultiByteToWideChar(CP_ACP, 0, pathin, -1, wpathin, MAX_PATH);

  XCEFixPathW(wpathin, wpathout);

  WideCharToMultiByte(CP_ACP, 0,
              wpathout, -1,
              pathout, MAX_PATH,
              NULL, NULL);

It seems that the codepage CP_ACP (Windows ANSI default) can conflict with my codepage as set by setlocale(), because different multi-byte to wide-char functions are used in cefixpath.c and io.c (mbstowcs() in the function _open_r which is called by fopen). This conflict causes my UTF-8 string to get mangled up by the conversion to and from multi-byte chars in XCEFixPath().

My temporary fix has been to replace the code in XCEFixPath() with a simple / to \ replacement on an 8-bit string. Obviously this only works on ASCII or UTF-8 strings.

I include my sample source code along with trace and log output from this program compiled with a patched and unpatched version of newlib.
Can somebody please take a look and advise me on a better fix to this problem please?

----------------------------------------------------------------------

>Comment By: Paul Sokolovsky (pfalcon)
Date: 2011-02-05 07:04

Message:
Assumed fixed.


----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2010-01-01 22:05

Message:
Please check out the fix I just checked in, it worked for me

----------------------------------------------------------------------

Comment By: Adrian Skilling (adrianskilling)
Date: 2009-12-15 06:33

Message:
Sorry. This can't work since MultiByteToWideChar cannot accept a string for
the locale, it only accepts a small limited set of code pages such as
CP_ACP, CP_UTF7 and CP_UTF8. MultiByteToWideChar has an advantage that it
can be given a code page but this advantage is not used because the code
page is fixed to CP_ACP.

I suggest that MultiByteToWideChar() is replaced with mbstowcs() which
would then make it consistent with that used in fopen() [in _open_r()
specifically). I shall try this on my version. But I can't be sure it would
work well for all languages. I'll get back.

----------------------------------------------------------------------

Comment By: Adrian Skilling (adrianskilling)
Date: 2009-12-15 03:10

Message:
Sorry. This can't work since MultiByteToWideChar cannot accept a string for
the locale, it only accepts a small limited set of code pages such as
CP_ACP, CP_UTF7 and CP_UTF8. MultiByteToWideChar has an advantage that it
can be given a code page but this advantage is not used because the code
page is fixed to CP_ACP.

I suggest that MultiByteToWideChar() is replaced with mbstowcs() which
would then make it consistent with that used in fopen() [in _open_r()
specifically). I shall try this on my version. But I can't be sure it would
work well for all languages. I'll get back.

----------------------------------------------------------------------

Comment By: Danny Backx (dannybackx)
Date: 2009-12-11 22:35

Message:
A trick I've seen used to figure out the locale is
 int xx = setlocale("C", LC_ALL);
 (void) setlocale(xx, LC_ALL);

The first call sets locale to "C" but also tells you what it was, the
second call restores.
You can do this to figure out the locale in XCEFixPathA, and use xx
instead of CP_ACP.
Would that fix your problem ?

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=865514&aid=2912803&group_id=173455

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world?
http://p.sf.net/sfu/oracle-sfdevnlfb
_______________________________________________
Cegcc-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/cegcc-devel