GnuCash
Contact   Instructions
Bug 797127 - Company name and address in reports not display properly, all Traditional Chinese (zh_tw) characters will GARBLED
Summary: Company name and address in reports not display properly, all Traditional Chi...
Status: RESOLVED FIXED
Alias: None
Product: GnuCash
Classification: Unclassified
Component: Reports (show other bugs)
Version: 3.4
Hardware: PC Windows
: Normal major
Target Milestone: ---
Assignee: reports
QA Contact: reports
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-06 03:00 EST by Dean
Modified: 2019-04-30 02:59 EDT (History)
5 users (show)

See Also:


Attachments
Statement of Comprehensive Income (253.22 KB, image/jpeg)
2019-03-06 03:00 EST, Dean
no flags Details
帳本選項:檔案(F)→內容(T)→商務 (238.62 KB, image/jpeg)
2019-03-06 03:08 EST, Dean
no flags Details
Save as XML (4.36 KB, application/x-gnucash)
2019-03-07 07:52 EST, Dean
no flags Details
Save as Sqlite3 (232.00 KB, application/x-gnucash)
2019-03-07 07:53 EST, Dean
no flags Details
Operating Procedure and Co. Info slots w/i XML (1.12 MB, application/pdf)
2019-03-07 22:40 EST, Dean
no flags Details

Description Dean 2019-03-06 03:00:15 EST
Created attachment 373195 [details]
Statement of Comprehensive Income

All Reports such as Statement of Comprehensive Income"報表(R)→收入&支出(I)→損益表&廠商報表" where have Traditional Chinese company name and address even contact name are not display properly. When I try to change it from "檔案(F)→內容(T)→商務", It will get garbled characters after press apply.
Comment 1 Dean 2019-03-06 03:08:05 EST
Created attachment 373196 [details]
帳本選項:檔案(F)→內容(T)→商務
Comment 2 Christopher Lam 2019-03-06 03:50:29 EST
Thank you for bug report.

It would be very useful to have further information.

It looks like the Chinese characters generated by the report itself are generally OK e.g. "損益表" corresponding to "Income Statement" is fine; but the characters obtained from the (F)->(T)->Company Details are wrong? This would suggest that the problem lies within the book-properties dialog box trying to receive Chinese characters?

Also I guess you're using Windows, Chinese (Traditional) locale?
Comment 3 Dean 2019-03-06 04:43:29 EST
(In reply to Christopher Lam from comment #2)
> Thank you for bug report.
> 
> It would be very useful to have further information.
> 
> It looks like the Chinese characters generated by the report itself are
> generally OK e.g. "損益表" corresponding to "Income Statement" is fine; but the
> characters obtained from the (F)->(T)->Company Details are wrong? This would
> suggest that the problem lies within the book-properties dialog box trying
> to receive Chinese characters?
> 
> Also I guess you're using Windows, Chinese (Traditional) locale?

Yes, my OS is windows 10 and locale is Chinese(Taiwan). I think the problem you guess is right. When I press apply in book-properties dialog box, the address will disappear and the other columns still show Chinese. But these columns in Chinese will garbled while I reopen the dialog box. Perhaps the reason why garbled is write in SQL(or sqlite) is not utf8 (unicode or big5), so that it fetch broken characters from SQL.
Comment 4 Christopher Lam 2019-03-06 06:36:50 EST
you can experiment with XML or various SQL types and report back? And perhaps write a sample sqlite3 or xml datafile and attach here?
Comment 5 Dean 2019-03-07 07:51:35 EST
(In reply to Christopher Lam from comment #4)
> you can experiment with XML or various SQL types and report back? And
> perhaps write a sample sqlite3 or xml datafile and attach here?

I've try to write xml and sqlite3 files. It seems not the SQL error... The chinese characters still garbled.
Comment 6 Dean 2019-03-07 07:52:58 EST
Created attachment 373198 [details]
Save as XML
Comment 7 Dean 2019-03-07 07:53:42 EST
Created attachment 373199 [details]
Save as Sqlite3
Comment 8 Christopher Lam 2019-03-07 08:38:37 EST
Thank you. Here are some experiments. 

File excerpt (XML uncompressed)
          <slot>
            <slot:key>Company Name</slot:key>
            <slot:value type="string">佳匯旅行社有限公司</slot:value>
          </slot>

ubuntu = recent maint, both LANG and LANGUAGE amended to
ja_JP.utf8 - normal
en_AU.utf8 - normal
fr_FR.utf8 - normal
fr_FR.utf8 - normal
en_PH.utf8 - normal
zh_TW.utf8 - normal
zh_CN.utf8 - normal

Win10 = v3.4
 default - UI in English - company name garbled "佳匯旅行社有�公�"
 LANG=zh_TW LANGUAGE=zh_TW - still garbled as "佳匯旅行社有�公�"

I'm not sure I'm the right one to fix this. Something in Windows Gtk?
Comment 9 John Ralls 2019-03-07 10:24:52 EST
Something in libgnucash/backends/xml is decoding that string in the system locale code page instead of in utf8. My first guess would be the slot string handler. Does it work correctly if you put the Chinese string in other slots?
Comment 10 Christopher Lam 2019-03-07 10:50:58 EST
Well like OP reported, in Windows:

(1) Chinese characters pasted into Company Name/ID/URL etc all get garbled when OK is clicked

(2) Any Chinese characters from XML gets garbled upon report printout, or when presented in the Book-Properties dialog.

(3) Any Chinese characters into Transaction/Accounts description/notes etc are fine.
Comment 11 John Ralls 2019-03-07 11:28:11 EST
I don't see that the OP reported any of those things. The OP reported that entering the business name and address fields in Chinese produced garbled strings after closing the dialog.

Pasted from where? If it's from a native Windows program then the characters will be in UTF16; if it's from a CMD or powershell window then they'll be encoded in the code page set for that shell. The Gtk/Windows clipboard code should take care of transcoding that but may get fooled by locale settings.

What about Chinese characters typed into the company name/id/url? Are those garbled after clicking OK?

Do Chinese characters in Transactions, Splits, or Accounts (which you say are "OK") get garbled in reports? That would point to Guile's encoding support, which is pretty badly broken in 2.0. I haven't yet managed to get 2.2 to build on Windows.
Comment 12 Dean 2019-03-07 12:28:10 EST
(In reply to Christopher Lam from comment #8)
> Thank you. Here are some experiments. 
> 
> File excerpt (XML uncompressed)
>           <slot>
>             <slot:key>Company Name</slot:key>
>             <slot:value type="string">佳匯旅行社有限公司</slot:value>
>           </slot>
> 
> I'm not sure I'm the right one to fix this. Something in Windows Gtk?

Those Chinese characters are correct. I've tried to use source code editor like brackets to find out the problem. But totally can't read like normal xml files. It's not just change filename extension can let the file be read as xml file. Those are the only things in my tether. lol

Co : 佳匯旅行社有限公司 → malfunction
Add: 10430 臺北市中山區松江路207號11樓 → malfunction
Contact: 狄恩 → malfunction
Tel/Fax/Email/Website/ID → functional
Comment 13 John Ralls 2019-03-07 14:37:15 EST
Dean, is the file compressed? Preferences>General, first tick-box under "Files". If it is you can use 7zip to uncompress it or you can un-tick the box and save after which you should be able to open it in your editor.
Comment 14 Christopher Lam 2019-03-07 17:55:17 EST
I've pasted from both web browsers and Notepad++

https://imgur.com/a/3ugdtpN should show sequence - something really wrong with slot output
Comment 15 Christopher Lam 2019-03-07 18:03:11 EST
*something wrong reading from slot
Comment 16 John Ralls 2019-03-07 18:25:01 EST
Are you pasting from a native Windows app directly into the GnuCash XML file in an edit window? That's guaranteed to not work as it will put UTF16 into what GnuCash expects to be encoded in UTF8.

To test this the safest way is to type the Chinese into the GnuCash UI after setting up a one of the Windows input methods. I generally use Pinyin. That makes simplified Chinese characters but it shouldn't matter. Click "Apply". If the values in the boxes change to accented latin characters then the problem is in the file properties dialog box code.

What Dean's screenshots show, though, isn't bad encoding, it's missing glyphs. That could be because the encoding is getting trashed but it's more likely to be because the font doesn't have glyphs for those codepoints. The solution to that is to find a different font that has the necessary glyphs--probably the one that Dean is using for everything else--and tell GnuCash to use it by setting it in %LOCALAPPDATA%\gtk-3.0\settings.ini, see https://wiki.gnucash.org/wiki/GTK3#Font_Size_in_Documents
Comment 17 Christopher Lam 2019-03-07 18:36:35 EST
I didn't paste into XML at all. All XML characters are generated by pasting into dialog-box and saving.

I'm struggling to use IME for now -> IME gives gibberish in mine. 

But pasting any extended clipboard into dialog will correctly modify the slots according to saved xml. 

It's the slot string reading that seems broken.
Comment 18 Dean 2019-03-07 22:40:46 EST
Created attachment 373201 [details]
Operating Procedure and Co. Info slots w/i XML
Comment 19 John Ralls 2019-03-07 22:54:15 EST
Comment on attachment 373201 [details]
Operating Procedure and Co. Info slots w/i XML

I suppose that there's a save step between "correct co. name..." and the second "read xml". Is there also a close/load between the first "read xml" and "correct co. name..."?
Comment 20 Christopher Lam 2019-03-07 23:09:04 EST
P.S. (to my own comment above) guile-2 does unicode natively, and I have not found any issues with extended unicode.

guile-1.8 was not using unicode. lilypond is stuck on guile-1.8 because they were having problems with guile-2.0 unicode and have refused to upgrade to 2.0 ("too hard" IIRC)

having said that I am aware the business options in book-properties do traverse some guile code, and I could add a tracer... but I don't have a windows build generator.

(re: last comment) I have repeated without save step on a new book and still gibberish. Now I think windows-dialog-box won't accept extended-chars.
Comment 22 John Ralls 2019-03-08 09:50:49 EST
Maybe the second one, see bug 796728.
Comment 23 Christopher Lam 2019-03-08 21:53:09 EST
I'd love to help but don't know how to generate an .exe
Comment 24 Christopher Lam 2019-03-10 04:07:11 EDT
ps. 佳匯旅行社有�公�that i was receiving is ansi
Comment 25 Geert Janssens 2019-03-19 07:13:50 EDT
Bug 796728 is my first suspect as well.
Comment 26 Geert Janssens 2019-03-19 07:15:25 EDT
(In reply to Christopher Lam from comment #14)
> I've pasted from both web browsers and Notepad++
> 
> https://imgur.com/a/3ugdtpN should show sequence - something really wrong
> with slot output

In the future please add the images directly in bugzilla. I don't know how long imgur keeps your screenshots so they may not be available in the future and hence make it harder to understand this bug.
Comment 27 Geert Janssens 2019-03-19 07:22:47 EDT
Comment on attachment 373201 [details]
Operating Procedure and Co. Info slots w/i XML

If I understand these steps correctly there may be two points of failure here:

1. When filling in company details in the new book assistant, this data is saved incorrectly into the book's slots as your brackets screenshot on page 10 suggests.

2. Re-entering this data after the new book wizard has run fixes the stored data (according to your bracket screenshot on page 12), but reopening the properties dialog shows garbled data. So here re-reading this data seems to fail.
Comment 28 Geert Janssens 2019-03-19 07:27:01 EDT
(In reply to Christopher Lam from comment #20)
> P.S. (to my own comment above) guile-2 does unicode natively, and I have not
> found any issues with extended unicode.
> 
It does so indeed on linux. The implementation likely has issues on Windows.

> guile-1.8 was not using unicode. lilypond is stuck on guile-1.8 because they
> were having problems with guile-2.0 unicode and have refused to upgrade to
> 2.0 ("too hard" IIRC)
> 
I can attest that migrating to guile-2.0 was indeed a challenge on gnucash as well. I wasn't aware though lilypond is still guile 1.8. That's not a good place to be in today.

> having said that I am aware the business options in book-properties do
> traverse some guile code, and I could add a tracer... but I don't have a
> windows build generator.
> 
Yes, I wanted to point that out as well. I have worked on these bits in the past when we made kvp access an engine-private thing.
Comment 29 John Ralls 2019-04-28 20:02:50 EDT
This turned out to be a problem with our workaround to a swig bug. It's fixed in git, please try the latest build at https://code.gnucash.org/builds/win32/maint.
Comment 30 Dean 2019-04-30 02:14:38 EDT
(In reply to John Ralls from comment #29)
> This turned out to be a problem with our workaround to a swig bug. It's
> fixed in git, please try the latest build at
> https://code.gnucash.org/builds/win32/maint.

I've install "gnucash-3.5-2019-04-29-git-3.5-130-ga71149713+.setup.exe" on my Win10 64bit. All Traditional Chinese characters works fine.
Comment 31 Geert Janssens 2019-04-30 02:59:47 EDT
Great!

Note John made a few additional corrections that will appear in today's nightly build (2019-04-30). That should be available in a hour or so.

Note You need to log in before you can comment on or make changes to this bug.