Created attachment 372686 [details] QIF file containing czech characters in transaction descriptions After updating to gnucash 3.1 from 2.6.x the following problem appeared: I have a QIF UTF8 encoded file with transaction descriptions containing Czech characters (e.g. 'ř', a sample file is in the attachment). Import proceeds without any error messages, however, all the unicode characters becomes corrupted; for example 'Připsaný bonusový úrok' becomes 'PÅ™ipsaný bonusový úrok' (which seems like ANSI conversion). Transaction imported before the update looks fine, so it does not look like database or displaying issue. My OS is Windows 10, please let me know if any additional details are needed.
Same results even if language of GnuCash interface is changed to Czech.
Can you edit gnucash/import-export/qif-imp/qif-file.scm Line 132 to say (with-input-from-file #:guess-encoding #t path and see if that fixes it?
Em, sorry, that should be c:\Program Files (x86)\gnucash\share\gnucash\scm\qif-import\qif-file.scm. You'll need to run the editor with admin privs.
The fix you suggested unfortunately generates an error message during import (something like "there is a bug during import" or similar). However, after digging into some specs, I managed to make it work for me by changing line 519 instead to (line-loop))))) #:encoding "UTF-8") Unfortunately, (line-loop))))) #:guess-encoding #t) has no effect for some reason
Ah, right, after the thunk. Sorry. #:encoding "UTF-8" was my fallback, but I'm concerned that other sources may use other encodings. Does your QIF have a BOM?
Tried both with a BOM and without - if there is no explicitly specified encoding (#:encoding "UTF-8") - result is the same. I don't know how exactly smart the encoding detection algorithm is, but, may be it's caused by the fact that not every transaction has such non-English symbols. In fact, only some of them has it. However, it should not be the case in case of BOM presence... I also experimented with converting to ANSI Windows-1250 and it results in messing up of some characters (and some are fine). Well, ANSI is a mess anyway and I can hardly imagine anyone sane using it for Czech alphabet these days.
Guile's default encoding is CP1252 so it probably tried to use that to decode your CP1250 file resulting in misinterpreting some characters. Unfortunately I can easily see an ignorant programmer who's only ever worked with Microsoft products using an ANSI code page instead of UTF-8. Open up a CMD shell and type chcp. It's going to return 1250 unless you've changed the default setting.
Further study of the thunk finds that the not-UTF-8 is already covered in the line handling code, so I've pushed the #:encoding "UTF-8" fix. It will be in tomorrow's nightly and GnuCash 3.2. Thanks!
Not sure this comment belongs here, so please be gentle. I haven't been able to find an answer, or indeed anyone else (except Mr. Zakharov) with this sort of issue. I am importing QIFs generated by Quicken 2000. There is one French payee who shows up frequently in my accounts, let's call him Gérard. Note the accented "e" — Quicken has no trouble handling it, and the QIF file retains the proper diacritic. But when imported into GnuCash (ver 3.4, Windows 10), the register entries morph the name to G?rard — the question mark substituting for the undigestable accented name. The GC font, which doesn't seem changeable in the GUI, is the default — Ariel, I think, which should be able to display "é." I suspect the QIF file code page is Windows 1252; it's not labeled. There are not so many instances that I can't fix this manually, but I shouldn't have to do so ... especially in a program that seems to set itself apart from similar software in its international compatibility. Is this a flaw in the QIF, the absence of a character-set flag? Or a problem with the QIF importer? I'm still clawing my way up the bottom slopes of the GC learning curve, so I don't want to rule out user error, either. Can you help, or point me in the right direction? Thanks. -Art-
The font is changeable, see https://wiki.gnucash.org/wiki/GTK3. But that's probably not the problem. User error is unlikely. Encoding seems likely, and if you can confirm that Quicken is emitting CP1252 instead of UTF8 that will nail it.
John, I don't know that I can confirm the encoding of the exported QIF. What limited documentation I have doesn't have that information, and I couldn't cull it from any of the Quicken files I browsed through. Nor was my web searching of any help. The QIF itself doesn't identify the encoding. I created a dummy transaction with a payee named "ÀýûøÈÂÐêñïé" (Somewhat difficult to pronounce — his friends call him "Àýû" — but he brags about his really interesting family tree....) All those characters are included in 1252, though of course they are in other character sets as well. Quicken 2000 accepted them all and exported them intact to the QIF file, which reads as follows in its entirety !Type:Bank D3/17'19 U-99.99 T-99.99 PÀýûøÈÂÐêñïé ^ So now — and sorry for my ignorance here — is this something that a not-very-clever user like me can fix, or is it an issue with the code that parses the QIF input, or something else? Thanks, -Art-
The way I check encoding is perhaps a bit technical: I open a file in an editor that can display the binary representations of the characters. UTF8 is quite distinctive so I can recognize it immediately, and what you pasted came through as UTF8. Unfortunately there are several bits of software in between that might have changed the encoding (2 browsers, webserver, Bugzilla) so I think it not conclusive. If you could attach the file (see the Add an Attachment link up top) then I can download it and be sure that no transcoding is taking place.
Created attachment 373220 [details] Export QIF from Quicken 2000 to verify character set Uploading file as requested. File created by Quicken 2000, limited to one transaction, with Payee name created with a selection of characters having diacritical marks. SHA-256 570521c00f1e71f885438a2037a84f722f4a15d2d7ff286a8891fd99a5c4f03b
Created attachment 373221 [details] Export QIF from Quicken 2000 to verify character set - image This system did not display the previous attachment (the original, downloaded QIF) the way they appear on my system. Therefore, this attachment includes screenshots of the QIF file as displayed in NOTEPAD++ and hexdump from Cygwin (combined into one image).
Yeah, I noticed, so I changed the type and downloaded it. It's CP-1252 and it seems that somehow Guile's encoding detection is failing to recognize it. Can you reverse the edit in comment 4 and see if it imports cleanly? Notepad++ will be perfect, note that you'll need to run it with admin privs.
John, If I understood your instructions correctly, the outcome remains unchanged; an apparently successful import except that the accented characters don't get passed into GC. Hope I understood you correctly, and appreciate your taking the time on this. Line 519 changed (with my comment added) to — (line-loop))))) #:guess-encoding #t) ;; changed from (line-loop))))) #:encoding "UTF-8") 2019-03-17 -Art-
You understood correctly. The default character set on my Win10 VM is CP437, the old IBM character set that has line-drawing characters where CP1252 has accented characters. There's a setting in Control Panel>Regions>Administrative for changing the locale for non-unicode programs. I tried a couple of those but got other IBM character sets, not CP1252. Next I refreshed my memory of the code. It's not Guile's fault, the conversion is done in C using glib, which responds to the CHARSET environment variable. Setting that to CP1252 in the environment file not only failed to get the CP1252 characters transcoded correctly, it also broke the XML backend so that it couldn't read the gnucash file. Looks like we'll need to find a more general transcoding method, at least on Windows. Notepad++ apparently has a transcoding feature built-in: http://docs.notepad-plus-plus.org/index.php/Convert_Or_Encode%3F. The quickest solution for you will be to use that to convert your files to UTF-8, then import them. That's not the same as this bug, so I've opened bug 797145 to track the problem.