GnuCash
Contact   Instructions
Bug 796754 - Guile encoding certain strings not as UTF-8
Summary: Guile encoding certain strings not as UTF-8
Status: RESOLVED FIXED
Alias: None
Product: GnuCash
Classification: Unclassified
Component: User Interface General (show other bugs)
Version: 3.2
Hardware: PC Windows
: Normal normal
Target Milestone: ---
Assignee: ui
QA Contact: ui
URL:
Whiteboard:
Keywords:
: 796804 797069 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-07-12 16:15 EDT by Aaron
Modified: 2019-05-22 16:39 EDT (History)
7 users (show)

See Also:


Attachments

Description Aaron 2018-07-12 16:15:14 EDT
Having an issue with character encoding in some places. 

To reproduce: 
File -> Properties -> Business -> Company address 
Enter text with accents, e.g. "á"
Click OK.
Open again and it becomes "á"
Also shows up elsewhere, such as custom reports.

In Windows 10 with en_US as locale.
Comment 1 Geert Janssens 2018-10-01 07:54:54 EDT
*** Bug 796804 has been marked as a duplicate of this bug. ***
Comment 2 Geert Janssens 2018-10-01 07:55:26 EDT
The duplicate bug has a screenshot visually illustrating the problem.
Comment 3 Geert Janssens 2019-01-10 14:49:39 EST
The root cause of this bug may be the same as eventually found in bug 796728 :
guile seems to want strings encoded in the system's locale where gtk uses utf8 by default.
Comment 4 John Ralls 2019-04-01 10:34:12 EDT
*** Bug 797069 has been marked as a duplicate of this bug. ***
Comment 5 Christopher Lam 2019-04-22 01:38:23 EDT
Experiments on Guile for Windows
--------------------------------
Gnucash v3.5
Uses Guile-2.0.14
Windows set to English (Australian)
environment - no LANG= setting

(In case bugzilla munges unicode I'll repost with unicode-char rewritten "#")

I run a report eg account-piecharts.scm, set Report Title to "Turkish Lira - ₺ Lira" and report-currency = TRY (symbol = ₺)

As we know guile-2.0 munges unicode in string-ports functions, therefore (format #f ": ~a" str) will munge unicode, as well as (with-output-to-string ...) in html-string-sanitize.

Consequences (current state of maint):
--------------------------------------
Report shows title as "Turkish Lira - ? Lira Assets: ?1,000"
The Tabbed window title shows "Turkish Lira - ₺ Lira"
saved-reports-2.8 (written by guile) shows "Turkish Lira \u20ba Lira"
book.gcm (written by C) shows "PageName=Turkish Lira ₺ Lira"
book.gcm (SchemeOptions) encodes title as "Turkish Lira \u20ba Lira"

Bugfix Attempt 1
----------------
We could fix this report via (string-append ": " str) and (open-output-string) but I think this is the wrong approach because we'll need to hunt *every* string-port function to modify it. It side-steps the issue.

Otherwise this approach is harmless.

Mark Weaver Monkey-patch Bugfix attempt 2
-----------------------------------------
see http://lists.gnu.org/archive/html/guile-user/2019-04/msg00025.html whereby string-ports functions are redefined to handle strings as UTF-8 instead of locale.

This has interesting consequences:

1. the existing saved-reports-2.8 and book.gcm, having encoded unicode as \uNNNN, are properly read back as extended chars.

2. the reports are fixed. Title is "Turkish Lira ₺ Lira (Balance ₺1,000)"
 the tabbed window title is still fine "Turkish Lira ₺ Lira"
 saving *again* into saved-reports-2.8 writes as "Turkish Lira ₺ Lira" (utf8)
 saving *again* into book.gcm writes (C part) PageName=Turkish Lira ₺ Lira
 but (scheme part) SchemeOptions= ... "Turkish Lira ₺ Lira"

3. relaunching GnuCash and reloading these UTF8 strings leads to:
 title is reads "Turkish Lira ₺ Lira" whereby ₺ is now #xe2 #x201a #xba)
 otherwise report is still working well
 loading from saved-reports-2.8 is working perfectly well.

Source code review
------------------
gnc-plugin-page-report.c writes PageName using g_value_set_string (gnc-plugin-page.c:604)
gnc-plugin-page-report.c writes SchemeOptions using g_key_file_set_value (gnc-plugin-page-report.c:862)
saved-report-2.8 is written in gnc-report.c (various)

PageName is read back at (unsure where)
SchemeOptions is read back with g_key_file_get_value (gnc-plugin-page-report.c:923)
saved-report-2.8 is read back with gfec_try_load (gnucash-bin.c:364)

Conclusion
----------
I wouldn't think applying monkey-patch is safe due to problems reading SchemeOptions, nor hunting all string-ports functions to rewrite them is the right approach (it's a bandaid).

I think upgrading to guile-2.2 will fix all these issues.

Monkey-patch:
-------------
Paste the following somewhere general eg utilities.scm:

"(when (string=? (effective-version) "2.0")
  ;; When using Guile 2.0.x, use monkey patching to change the
  ;; behavior of string ports to use UTF-8 as the internal encoding.
  ;; Note that this is the default behavior in Guile 2.2 or later.
  (let* ((mod                     (resolve-module '(guile)))
         (orig-open-input-string  (module-ref mod 'open-input-string))
         (orig-open-output-string (module-ref mod 'open-output-string))
         (orig-object->string     (module-ref mod 'object->string))
         (orig-simple-format      (module-ref mod 'simple-format)))

    (define (open-input-string str)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (orig-open-input-string str)))

    (define (open-output-string)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (orig-open-output-string)))

    (define (object->string . args)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (apply orig-object->string args)))

    (define (simple-format . args)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (apply orig-simple-format args)))

    (define (call-with-input-string str proc)
      (proc (open-input-string str)))

    (define (call-with-output-string proc)
      (let ((port (open-output-string)))
        (proc port)
        (get-output-string port)))

    (module-set! mod 'open-input-string       open-input-string)
    (module-set! mod 'open-output-string      open-output-string)
    (module-set! mod 'object->string          object->string)
    (module-set! mod 'simple-format           simple-format)
    (module-set! mod 'call-with-input-string  call-with-input-string)
    (module-set! mod 'call-with-output-string call-with-output-string)

    (when (eqv? (module-ref mod 'format) orig-simple-format)
      (module-set! mod 'format simple-format))))"
Comment 6 Christopher Lam 2019-04-22 01:42:10 EDT
addendum to Mark Weaver's monkeypatch section above in case it's not clear:

4. therefore SchemeOptions part in book.gcm, although it has been written using UTF8 properly, is not read back by C correctly, and is completely munged again (and is not guile's fault!)
Comment 7 Christopher Lam 2019-05-22 06:01:30 EDT
Aaron please test a recent nightly from https://code.gnucash.org/builds/win32/maint - I think this bug is considered fixed for next release.
Comment 8 Aaron 2019-05-22 16:39:42 EDT
I tested the build for 3.5 from 5/21 and the problem seems to be fixed. Thank you.

Note You need to log in before you can comment on or make changes to this bug.