See screenshots. [A]ssociate header is not translated correctly. This header is visible on APAR registers as shown, or non-APAR registers under the 'Reconcile' header.
Created attachment 373353 [details] english
Created attachment 373354 [details] arabic
Created attachment 373355 [details] hebrew
The question mark in a diamond indicates a codepoint for which your selected font lacks a glyph. That might be because there's a bad codepoint in the the msgstr but more it's more likely that you need to select a different font. What file and line has the English string that's being translated? The only "A" in ar.po is for gnucash/import-export/import-main-matcher.c:501. Aside from that there's not much point in filing bug reports against any translations except German because aside from Mechtilde and Frank none of the translators see bug reports.
This should be coming from split-register-layout.c, line 711 and in the ar.po file it is 18666
Hmm, the msgstr is "اقرن ملف" (That's an LTR rendering). The UTF8 is d8a7 d982 d8b1 d986 20 d985 d984 d981, quite a bit more than what's displayed. Maybe the column width is truncating the string at a partial codepoint.
This is from ar.po with this translator comment, like wise for Reconciled:R and a couple of others... #. Translators: The abbreviation for 'Associate' #. in the header row of the register. Please only #. translate the portion after the ':' and #. leave the rest ("Associate:") as is. #: gnucash/register/ledger-core/split-register-layout.c:711 #: gnucash/register/ledger-core/split-register-model.c:326 msgid "Associate:A" msgstr "اقرن ملف" so shouldn't the msgstr be... msgstr "Associate:اقرن ملف" Have not tried changing this to see if it works. Looking through the po files there is a wide variation, some look like they have also translated the Associate or Reconciled part.
Bob that's the right one. Many languages would need fixing.
- N_("Reconciled:R") + NC_("Abbreviation of 'Reconciled'","R") would be a proper way, but I want run a few tests, how existing translations behave. The problem: The corresponding _() should become g_dpgettext2 (NULL, "some context", "a default message") Finally we have to pass --keyword=NC_:1c,2 to xgettext
Just a note: This issue is also reported to the console as "Pango-WARNING **: <timestamp>: Invalid UTF-8 string passed to pango_layout_set_text()"
(In reply to Frank H. Ellenberger from comment #9) > - N_("Reconciled:R") > + NC_("Abbreviation of 'Reconciled'","R") > would be a proper way, but I want run a few tests, how existing translations > behave. > > The problem: > The corresponding _() should become g_dpgettext2 (NULL, "some context", "a > default message") > > Finally we have to pass --keyword=NC_:1c,2 to xgettext char * val = NC_("context","value"); will be interpreted by the C preprocessor to set val to "value". That is from a C point of view the context is completely ignored. However by adding the keyword to xgettext, it will be picked up for translation, with context. If you later want to refer to this string in a location where you do want to retrieve the actual translation (say while emitting a warning), it seems to me you have to PWARN (C_("context", val)); Which means you have to pass the same context again to the C_ function to disambiguate the translation of val here. To prevent typos it's probably best to define the context string once and reuse that definition, like so #define SOME_CONTEXT "context for string whatever" ... char* val = NC_(SOME_CONTEXT, "value"); .... PWARN (C_(SOME_CONTEXT, val)); As for adding --keyword=NC_:1c,2, you can do so in po/gnucash-pot.cmake. That's where we invoke xgettext. You'll probably want to add the same for C_ -> --keyword=C_:1c,2
static const char* would be better than #define if all uses of SOME_CONTEXT are in the same compilation unit.
Right. It was merely a demonstration of how to reuse a string. Yours is indeed better for local use.
ISTR the "compilation context" is mostly glade. :-(
Is that an issue ?
Not, if somebody explains in the wiki, how to use them in the program glade.
This bug has so far been exclusively about message strings in C. From examining the gtk/gtkbuilderparser.c it appears that if a string with a translatable="yes" attribute also has a context="foo" attribute then it will be passed to g_dgettext() instead of to gettext(). The docs say If the “translatable” attribute is set to a true value, GTK+ uses gettext() (or dgettext() if the builder has a translation domain set) to find a translation for the value. This happens before the value is parsed, so it can be used for properties of any type, but it is probably most useful for string properties. It is also possible to specify a context to disambiguate short strings, and comments which may help the translators. https://developer.gnome.org/gladeui/stable/properties.html mentions `translatable` but not `context` so it seems that that tag will have to be added in an editor later and we can only hope that reopening the file in Glade won't eat it.
I believe several widgets in glade allow you to set a context for properties that are considered translatable. For example for the label, if you set the label text, there's a button next to the text entry field. When clicked it will open up a dialog that has that same text entry and a new field for translation context.
Yes, I know, how to add a constant string as context onto a translatable label in glade. But knows somebody a way to assign a in a C file named constant? While reviewing or improving translations, I often find slightly different written strings and try to fix them by unification. To move them from glade to C files and use named constants seems to be overkill.
I agree, and when I wrote comment 11 I was not considering glade files. There is no reasonable way that I know of where you can use a similar constant between C and glade. That's not only so for translation context messages. I came across other situations where I had to essentially type the same strings once in a glade file and later in the sources. So if context strings are meant to be shared between C code and glade files, that's what I'd do: just type the string twice, and double check I typed it the same in both locations. Comment 11 is geared towards cases where the same context string is needed more than once in the sources. Which will typically be the case if you use NC_ in combination with C_ for a given string.
OK, back to the task. If possible, I would prefer to replace N_();..._() by _()/Q_() in the first occurrence. If the corresponding _() is in the same module, it is easy, but I started with "reconciled:R" which has 3 occurences. The first is in a call of gnc_search_param_set_title. This again is called 13 times; all other cases have no need of a context. search-param.c has no calls of _(). Where are the search-params used?
I got a bit more context to comment 21 from irc: https://lists.gnucash.org/logs/2019/08/28.html#T13:19:22 Imagine we have the following function calls: gnc_search_param_set_title (param, N_("Number/Action")) gnc_search_param_set_title (param, N_("Reconciled:R") + 11) The first uses an ordinary translatable string (marked for translation, but not to be translated on the spot, using N_) The second is a particular form to prepend a context string ("Reconciled:") to the translatable string ("R"). This is complemented with an offset ('+ 11')to manually strip this context in the C preprocessor context. This is an old construct, perhaps from before gettext had proper support for context messages. It's certainly not how present day gettext promotes working with context messages. Now assume we want to replace this obsolete context construct N_("Reconciled:R") + 11 with the gettext supported NC_("Reconciled","R") Should we then also change N_("Number/Action") to NC_("","Number/Action") ? In other words, will changing from N_ to NC_ change the output of the macro in a way that affects the calling context ? The answer is no. N_("Reconciled:R") is a macro that will expand to "Reconciled:R". However as we only want to show "R", we move the pointer head 11 bytes. So what gnc_search_param_set_title eventually sees is "R". NC_("Reconciled", "R") is also a macro and it expands to "R", that is the macro will simply drop the context part. So gnc_search_param_set_title will see the exact same thing. N_("Number/Action") doesn't have a context prefix, and we want gnc_search_param_set_title to pick up "Number/Action", which is exactly what N_("Number/Action") will expand into. So there's no need to change anything here. That's how the C preprocessor interprets these strings. The other half of the story is how xgettext interprets them. That's the tool responsible of scanning sources for translatable strings and collecting them in our gnucash.pot file. When xgettext encounters N_("Reconciled:R") it sees a translatable string. The + 11 bit is ignored by xgettext. It's only interested in what's enclosed in N_( and ). As there's nothing particular about a ":" in a string, xgettext treats this as any other string and will put the complete thing up for translation. It is up to us to watch only the "R" is translated, resulting in (understandable) human mistakes as this bug. However if xgettext encounters NC_("Reconciled","R"), it knows this is a translatable string with a context message. And it will write this in a different form to the pot file: the context goes into a context comment and only the translatable bit will be used as translatable string. So this is a much neater way of setting context strings. For completeness there is a third form: NQ_("context|msg") This is a glib specific variation where context and msg are in one single string, separated by a |. Other than that it's expanded identically to NC_ in that the context is stripped off by the C prepocessor and xgettext will write the same thing to the pot file in both forms. Now all that was said above with N_("context:msg") + x NC_("context", "msg") NQ_("context|msg") equally holds for _("context:msg") + x C_("context", "msg") Q_("context|msg") except in this case the translated message minus the context is substituted. It's important to realize that for this delayed translation to work properly one should be careful to always pass context as expected. This may be a no-brainer from a distance, but the devil is in the details. There are drawbacks though: Let's go back to N_. This essentially means "mark for translation but don't actually replace with translation here". To use the translation, we have to wrap the same translatable string or a variable set to that string in _() So if we have somewhere const char* bla = N_("to translate"); We can use a translated version later in the code via PWARN(_(bla)); That's exactly the same for NC_ and C_: const char* blac = N_("context","translate"); ... PWARN(C_("context", bla)); It's important to realize that you can only use combinations of N_ with _ and NC_ with C_ Defining a variable with NC_ and later trying to expand it with _ will very likely lead to unwanted results. And here's where this limitation comes into play: Imagine we have a function like this: void write_log (const char* value) { PINFO(_(value)); } This function would print a translation of value to the log files. Now different parts of the code may want to call this function: write_log (N_("An account log message")); ... write_log (NC_("This is for a vendor invoice", "An invoice log message")); ... write_log (NC_("This is for a customer invoice", "An invoice log message")); The example is slightly contrived as use the terms bills and invoices and also if we'd be passing the strings directly as here, we'd directly translate them. Let's just assume we need the strings to be passed untranslated for further processing before they are written to the log translated. The problem here is that function write_log expects a translatable string without context, but in the last two cases we pass it a string with context. In those cases gettext will be searching it's message catalog for a message with id "An invoice log message" without context, but likely won't find one. I have no idea how gettext will respond in that case. I suspect it will just return the untranslated version. Not what we intended. So we can't write code like that. It's up to those working directly in the code to watch out for situations like this. This is not limited to NC_/C_ by the way - our old 'N_("context:msg") + x' is equally susceptible to this as is the Q_("context|msg") variant However both Q_("context|msg") and C_("context","msg") are definitely preferable over 'N_("context:msg") + x' for various reasons: * the pointer manipulation (+ x) is prone to mistakes: one may miscalculate or forget to update the displacement in case the context string ever gets changed. Not to mention it's a construct required a fair bit of C pointer logic, which is a higher bar for translators that want to help fixing a string. * the resulting message signature in the pot file is much less confusing in case of Q_ or C_: the context string is moved into a context comment, rather than still part of the message ID. In the end I have no strong preference to C_ or Q_ other than perhaps C_ expands to pgettext (a gettext native function) where Q_ expands to g_dpgettext (a glib function). Sticking with gettext native functions may be a small advantage in terms of portability.
Incidentally our xgettext invocation is not configured correctly to properly interpret the Q_ variant of context messages. I have a patch ready in my queue to fix this.
To keep it simple we could also replace the 11 occurrences of Q_("<ctx>|<id>") by C_("<ctx>","<id>").
> the pointer manipulation (+ x) is prone to mistakes: one may miscalculate or forget > to update the displacement in case the context string ever gets changed. Including the mistake that's the cause of this bug: The translator translated the whole string and the +11 offset landed in the middle of a utf-8 sequence, creating an invalid character.
My current problem: In file gnucash/register/ledger-core/split-register-layout.c in gnc_split_register_layout_add_cells gnc_register_add_cell (layout, ...) is called with N_() for abbreviated column headers and samples to determinate the width. But I have no idea, where the output with the corresponding _() appears.
the sample is not sent for output anywhere. it's just used to determine column width IIUC.
Correct it is only used to set the default column width on register load.
This should be fixed in the commit serie ending with commit bbcf19a. In theory one could still define static const char* ... but I see some profit only for the "sample" patterns. And I might have missed a few obsolete translator comments.
commit 92bae3f added another layout file. N.B. Before the changes in this bug, the localized samples were never applied because of the usage of N_() [= gettext_noop()] on the column width.