GnuCash
Contact   Instructions
Bug 797522 - Focus after reconcile jumps to a different account
Summary: Focus after reconcile jumps to a different account
Status: RESOLVED FIXED
Alias: None
Product: GnuCash
Classification: Unclassified
Component: Register (show other bugs)
Version: 3.7
Hardware: PC Linux
: Normal normal
Target Milestone: ---
Assignee: ui
QA Contact: ui
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-11 03:45 EST by obscurans
Modified: 2020-02-17 08:26 EST (History)
6 users (show)

See Also:


Attachments
Crash log (23.50 KB, application/zip)
2020-01-25 18:15 EST, Mike Alexander
no flags Details
Mike's crash report unzipped. (81.81 KB, text/plain)
2020-01-25 18:32 EST, John Ralls
no flags Details
Crash report 2 (84.19 KB, text/plain)
2020-01-26 01:59 EST, Mike Alexander
no flags Details

Description obscurans 2019-12-11 03:45:29 EST
Conditions not fully known.

A lot of the time, but seemingly not 100%, after reconciling an account (possibly after other actions), the focus as displayed in the account window is on the date of the new ledger entry, but the actual focus is on *an entirely different account that is opened*.

From mild testing, this seems to reliably target a specific different account (i.e. it is not just the very first account window opened).

This causes quite a few hangups when trying to edit the date, as it is actually invisibly editing the date as selected on the entry in the different account window.

With the lack of undo functionality unintended data loss is possible (and I have been bitten by this by overwriting a date on an effectively arbitrary entry elsewhere).
Comment 1 obscurans 2019-12-11 03:47:55 EST
When this bug effect is active, attempting to select the date on the currently display account window fails and a double-click is necessary to force the focus back on the currently displayed account window.
Comment 2 obscurans 2019-12-11 03:56:01 EST
I also want to mention that my workflow after reconciliation consists of switching to my file manager and opening up a new statement, so that may also have been the triggering event.
Comment 3 John Ralls 2019-12-11 20:29:36 EST
What distro/version of Linux? Does your window manager display menus on each window or does it use a common menubar across the top of the screen like Macs do?

Let's check that I understand exactly what's happening. Is the following description correct?

You have several account registers open, each in its own window. One of them has keyboard focus, meaning that if you start typing it will change a field in the blank transaction at the bottom of the register (the normal first field would be the date).
You click either Actions>Reconcile or the Reconcile button on the toolbar, fill in the date and ending balance from the statement you're reconciling, tick off the cleared transactions until the balances match and click the finish button on the toolbar. At no time during the reconcile process do you leave the reconcile window. When the reconcile window closes and a different window from the one that you started from now has visual focus, meaning that its frame is darker than the rest and the cursor is blinking in the date field of the blank transaction. You start typing a date, but that date field doesn't change; instead the date field in a random completed transaction in a third window, neither the one that you reconciled nor the one that apparently has focus, is edited.
Comment 4 Christopher Lam 2019-12-12 04:22:08 EST
I confirm having seen this behaviour many times. May be after closing Reconcile Window, or some other window. Focus is set to a the register of a hidden tab. I can start typing, <TAB>, etc. Pressing <ENTER> will finalise the recording of a new txn. No screen change is visible apart from the status bar.

I've adapted by carefully watching the status bar prior to entering transactions.

Ubuntu 19.10, menu within GNC window.
Comment 5 John Ralls 2019-12-12 11:40:53 EST
Chris, by "hidden tab" you mean that if you have multiple tabs in a single window that typing edits a register that's different from the one displayed?

What does the status bar say that indicates the keystrokes are going to the wrong register?
Comment 6 Christopher Lam 2019-12-12 19:16:59 EST
(In reply to John Ralls from comment #5)
> Chris, by "hidden tab" you mean that if you have multiple tabs in a single
> window that typing edits a register that's different from the one displayed?

Yes

> What does the status bar say that indicates the keystrokes are going to the
> wrong register?

Status bar displays the usual prompts while editing a register entry - date, "Enter a reference, such as a check number" etc.
Comment 7 John Ralls 2019-12-12 19:55:23 EST
That confuses me: How does watching the status bar help then? Wouldn't not seeing the blank transaction being filled in be the indication that something had gone wrong?

I suppose that this must not happen every time or you'd know to double-click in the blank transaction to force focus there every time.
Comment 8 Christopher Lam 2019-12-12 20:09:10 EST
(In reply to John Ralls from comment #7)
> That confuses me: How does watching the status bar help then? Wouldn't not
> seeing the blank transaction being filled in be the indication that
> something had gone wrong?

Because the status bar showing activity and the displayed tab being frozen is an indicator that the software hasn't completely crashed...
Comment 9 Bob 2020-01-25 08:14:27 EST
I have pushed a fix for this to maint and will be in the next release.

Please test when available and reopen this bug if it is not fixed with any additional information you can provide.
Comment 10 Mike Alexander 2020-01-25 18:15:54 EST
Created attachment 373556 [details]
Crash log

I built GnuCash with this fix and got a crash which looks like it might be related.  I've attached the crash log.

I was reconciling an account and decided I had botched things up enough that I should start over.  I did this by using the recent files list in the File menu to open the same file that I was working on but telling GNuCash not to save changes.  After it opened the file and was restoring the window (one window with a lot of tabs) it crashed.

The crash occurred in an optimized Quartz version built with no symbols.  I built a debug version with symbols but couldn't reproduce the crash.
Comment 11 John Ralls 2020-01-25 18:32:08 EST
Created attachment 373557 [details]
Mike's crash report unzipped.

No need to zip MacOS crash logs, they're only 84K each.
Comment 12 John Ralls 2020-01-25 18:56:09 EST
This looks like it's a use-after-free of either current_plugin_page or register_plugin_page.
Comment 13 Mike Alexander 2020-01-26 01:59:04 EST
Created attachment 373559 [details]
Crash report 2

This is probably the same crash in slightly different circumstances.  This time I was just starting a reconcile (I think I had just dismissed the reconcile startup dialog).
Comment 14 Bob 2020-01-26 09:09:41 EST
Well that is disappointing, as usual I can not reproduce on my Linux VM.
Do the numbers after the function mean any thing?

I thought I was protecting those passed in values with this...

if (!current_plugin_page || !GNC_IS_PLUGIN_PAGE_REGISTER(current_plugin_page) ||
    !register_plugin_page || !GNC_IS_PLUGIN_PAGE_REGISTER(register_plugin_page))
        return;

I will see if I can reproduce on Windows after it finishes updating.
Comment 15 John Ralls 2020-01-26 11:48:46 EST
Bob, the crash is in that protection code, trying to dereference either current_plugin_page or register_plugin_page. Whichever one is crashing isn't null; it's either uninitialized or the address it's pointing to has been freed and reused.

The use of idle events *greatly* increases the likelihood of use-after-free because the object may be freed while the event is pending. One solution is to use weak pointers (GLib provides a weak ptr facility, see https://developer.gnome.org/gobject/stable/gobject-The-Base-Object-Type.html) for the pointer in the event closure.

There are several tools for finding use-after-free errors. I've recently become very fond of address sanitizer (https://en.wikipedia.org/wiki/AddressSanitizer) after it helped me find several double-frees in gtk-quartz after months of trying with the MacOS Malloc tools.
Comment 16 Bob 2020-01-27 06:34:58 EST
I am still trying to get my head round this...
From what I can see both reports do not involve the idle function yet, the first report stems from the reload and recreating the pages and then maybe finally switching to the current page, all done from a normal emit signal callback, not sure whether this was the first register page.

Mike,
How may tabs and did you see some being added to the main window before crash?
Are they all register tabs apart from the one Account tab?
Is this reproduceable, I have removed my symbols, got 15 register tabs and tried to reconcile one, cancelled and then reloaded same file with no problem.

If it is reproduceable, maybe you can add some g_print statements to that function and split the protection code in half to identify whether it is the current_plugin_page or the register_plugin_page?

This would be so much easier if I could get it to fail!!!
Comment 17 John Ralls 2020-01-27 12:20:54 EST
Bob,

Sorry, I didn't mean to suggest that these crashes were caused by the idle, though they are likely the result of a similar problem that I'll describe in more detail below. I added that because gnc_plugin_register_main_window_page_changed queues an idle event and the problem is more prevalent in idle events because they're asynchronous to the rest of the program flow.

When you register an event or a signal handler you create a closure that includes the pointer that you pass as the handler's data. If the memory pointed to by that pointer is no longer valid when the signal or event fires you get the kind of crash we're seeing in this bug. There are two ways to prevent this: With good object-lifetime management (automatic in C++, requires care in GObject that's notably missing in most of GnuCash) you can unregister the handler with g_idle_remove* or g_signal_disconnect* in the destructor (for GObject, the object's dispose or finalize, prefer the former) or by passing a weak pointer that will get nulled when the object is freed. In GObject that works only for memory allocated with g_object_new; in C++ only with std::shared_ptr or boost::shared_ptr. Since plugin_page *is* allocated with g_object_new it would work in this case.
Comment 18 John Ralls 2020-01-27 13:01:02 EST
Bob,

Never mind the object lifetime problem, there's a more fundamental one: gnc_main_window's page_changed signal is defined for one parameter, and only plugin_page is provided when gnc_plugin_register_main_window_page_changed is registered as the handler for the page_changed signal. But gnc_plugin_register_main_window_page_changed takes 2 parameters, current_page and plugin_page. The second parameter is uninitialized. If you're lucky or you have an overly-accommodating compiler it will be NULL and the function will return without doing anything. Otherwise it will crash as seen in Mike's first crash report.
Comment 19 John Ralls 2020-01-27 13:16:55 EST
This disassembly snippet might be helpful:
libgnc-gnome.dylib`gnc_plugin_register_main_window_page_changed:
libgnc-gnome.dylib[0x73970] <+0>:   pushq  %rbp
libgnc-gnome.dylib[0x73971] <+1>:   movq   %rsp, %rbp
libgnc-gnome.dylib[0x73974] <+4>:   pushq  %r15
libgnc-gnome.dylib[0x73976] <+6>:   pushq  %r14
libgnc-gnome.dylib[0x73978] <+8>:   pushq  %r12
libgnc-gnome.dylib[0x7397a] <+10>:  pushq  %rbx
libgnc-gnome.dylib[0x7397b] <+11>:  testq  %rsi, %rsi
libgnc-gnome.dylib[0x7397e] <+14>:  je     0x73ab2                   ; <+322> at gnc-plugin-page-register.c:1179:1
libgnc-gnome.dylib[0x73984] <+20>:  movq   %rdx, %r14
libgnc-gnome.dylib[0x73987] <+23>:  movq   %rsi, %rbx
libgnc-gnome.dylib[0x7398a] <+26>:  mfence
libgnc-gnome.dylib[0x7398d] <+29>:  cmpq   $0x0, 0x4eb53(%rip)       ; parent_class + 7
libgnc-gnome.dylib[0x73995] <+37>:  jne    0x739bb                   ; <+75> [inlined] gnc_plugin_page_register_get_type + 49 at gnc-plugin-page-register.c:1157
libgnc-gnome.dylib[0x73997] <+39>:  leaq   0x4eb4a(%rip), %rdi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x7399e] <+46>:  callq  0x99448                   ; symbol stub for: g_once_init_enter
libgnc-gnome.dylib[0x739a3] <+51>:  testl  %eax, %eax
libgnc-gnome.dylib[0x739a5] <+53>:  je     0x739bb                   ; <+75> [inlined] gnc_plugin_page_register_get_type + 49 at gnc-plugin-page-register.c:1157
libgnc-gnome.dylib[0x739a7] <+55>:  callq  0x6c310                   ; gnc_plugin_page_register_get_type_once at gnc-plugin-page-register.c:610
libgnc-gnome.dylib[0x739ac] <+60>:  leaq   0x4eb35(%rip), %rdi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x739b3] <+67>:  movq   %rax, %rsi
libgnc-gnome.dylib[0x739b6] <+70>:  callq  0x9944e                   ; symbol stub for: g_once_init_leave
libgnc-gnome.dylib[0x739bb] <+75>:  movq   0x4eb26(%rip), %rsi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x739c2] <+82>:  movq   (%rbx), %rcx
libgnc-gnome.dylib[0x739c5] <+85>:  testq  %rcx, %rcx
libgnc-gnome.dylib[0x739c8] <+88>:  je     0x739d1                   ; <+97> at gnc-plugin-page-register.c:1157:34
libgnc-gnome.dylib[0x739ca] <+90>:  movb   $0x1, %al
libgnc-gnome.dylib[0x739cc] <+92>:  cmpq   %rsi, (%rcx)
libgnc-gnome.dylib[0x739cf] <+95>:  je     0x739de                   ; <+110> at gnc-plugin-page-register.c:1158:10
libgnc-gnome.dylib[0x739d1] <+97>:  movq   %rbx, %rdi
libgnc-gnome.dylib[0x739d4] <+100>: callq  0x99574                   ; symbol stub for: g_type_check_instance_is_a
libgnc-gnome.dylib[0x739d9] <+105>: testl  %eax, %eax
libgnc-gnome.dylib[0x739db] <+107>: setne  %al
libgnc-gnome.dylib[0x739de] <+110>: testq  %r14, %r14
libgnc-gnome.dylib[0x739e1] <+113>: je     0x73ab2                   ; <+322> at gnc-plugin-page-register.c:1179:1
libgnc-gnome.dylib[0x739e7] <+119>: testb  %al, %al
libgnc-gnome.dylib[0x739e9] <+121>: je     0x73ab2                   ; <+322> at gnc-plugin-page-register.c:1179:1
libgnc-gnome.dylib[0x739ef] <+127>: mfence
libgnc-gnome.dylib[0x739f2] <+130>: cmpq   $0x0, 0x4eaee(%rip)       ; parent_class + 7
libgnc-gnome.dylib[0x739fa] <+138>: jne    0x73a20                   ; <+176> [inlined] gnc_plugin_page_register_get_type + 49 at gnc-plugin-page-register.c:1158
libgnc-gnome.dylib[0x739fc] <+140>: leaq   0x4eae5(%rip), %rdi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x73a03] <+147>: callq  0x99448                   ; symbol stub for: g_once_init_enter
libgnc-gnome.dylib[0x73a08] <+152>: testl  %eax, %eax
libgnc-gnome.dylib[0x73a0a] <+154>: je     0x73a20                   ; <+176> [inlined] gnc_plugin_page_register_get_type + 49 at gnc-plugin-page-register.c:1158
libgnc-gnome.dylib[0x73a0c] <+156>: callq  0x6c310                   ; gnc_plugin_page_register_get_type_once at gnc-plugin-page-register.c:610
libgnc-gnome.dylib[0x73a11] <+161>: leaq   0x4ead0(%rip), %rdi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x73a18] <+168>: movq   %rax, %rsi
libgnc-gnome.dylib[0x73a1b] <+171>: callq  0x9944e                   ; symbol stub for: g_once_init_leave
libgnc-gnome.dylib[0x73a20] <+176>: movq   0x4eac1(%rip), %rsi       ; gnc_plugin_page_register_get_type.g_define_type_id__volatile
libgnc-gnome.dylib[0x73a27] <+183>: movq   (%r14), %rax
libgnc-gnome.dylib[0x73a2a] <+186>: testq  %rax, %rax
libgnc-gnome.dylib[0x73a2d] <+189>: je     0x73a34                   ; <+196> at gnc-plugin-page-register.c:1158:35
libgnc-gnome.dylib[0x73a2f] <+191>: cmpq   %rsi, (%rax)
libgnc-gnome.dylib[0x73a32] <+194>: je     0x73a40                   ; <+208> [inlined] gnc_plugin_page_register_get_type at gnc-plugin-page-register.c:1161
libgnc-gnome.dylib[0x73a34] <+196>: movq   %r14, %rdi
libgnc-gnome.dylib[0x73a37] <+199>: callq  0x99574                   ; symbol stub for: g_type_check_instance_is_a
libgnc-gnome.dylib[0x73a3c] <+204>: testl  %eax, %eax
libgnc-gnome.dylib[0x73a3e] <+206>: je     0x73ab2                   ; <+322> at gnc-plugin-page-register.c:1179:1
libgnc-gnome.dylib[0x73a40] <+208>: mfence
libgnc-gnome.dylib[0x73a43] <+211>: cmpq   $0x0, 0x4ea9d(%rip)       ; parent_class + 7
libgnc-gnome.dylib[0x73a4b] <+219>: jne    0x73a71                   ; <+257> [inlined] gnc_plugin_page_register_get_type + 49 at gnc-plugin-page-register.c:1161
Comment 20 John Ralls 2020-01-27 13:31:39 EST
Note that there's a small offset from the addresses in Mike's crash reports, with the g_type_check_instance_is_a call at +199 instead of +204. I expect that's due to my using a newer version of clang.
Comment 21 Bob 2020-01-29 11:10:04 EST
(In reply to John Ralls from comment #18)
> Bob,
> 
> Never mind the object lifetime problem, there's a more fundamental one:
> gnc_main_window's page_changed signal is defined for one parameter, and only
> plugin_page is provided when gnc_plugin_register_main_window_page_changed is
> registered as the handler for the page_changed signal. But
> gnc_plugin_register_main_window_page_changed takes 2 parameters,
> current_page and plugin_page. The second parameter is uninitialized. If
> you're lucky or you have an overly-accommodating compiler it will be NULL
> and the function will return without doing anything. Otherwise it will crash
> as seen in Mike's first crash report.

I remember double checking this and looked again and I think the signal and function are correct, maybe I should of specified the second parameter as 'gpointer user_data' instead of GncPluginPage. I also do wonder if it would be better to redefine this signal so it only has a 'gpointer user_data' parameter as the current_page could easily be obtained with gnc_main_window_get_current_page.

Looking at your disassemble, not great at reading these but I see there are two occurrences of g_type_check_instance_is_a which ties up with that function and as the error was on the second I might assume it was the register_plugin_page at fault.

I'm looking at adding signal_disconnect's, probably in the destroy_widget function as that is one of the first calls for destruction.

There is also a possible problem when a page moves to a new window which I am also looking at.
Comment 22 John Ralls 2020-01-29 21:17:26 EST
> I think the signal and function are correct, maybe I should of specified the second parameter as 'gpointer user_data' instead of GncPluginPage

You're right, I'm getting rusty at this stuff. The signal itself has one parameter, the  handler gets two, the signal's paramater and the closure. The Gnome idiom is to declare the parameters according to the signal spect and cast them to their real types with the GObject macros:
  handle_main_window_changed(GtkWindow* window, GObject* object, gpointer user_data)
  { 
     GncPluginPage* current_page = GNC_PLUGIN_PAGE(object);
     GncPluginPage* plugin_page = GNC_PLUGIN_PAGE(user_data); 
     if (!current_page || plugin_page) return;
     ...
  }

Because that ensures that the parameters you got are the type that you want. The way you did it works too but might be jarring to some people because of the implicit casts. Either way calls g_type_check_instance_is_a and will crash if it gets a non-NULL invalid address.

There's some more guidance on managing closure memory at https://developer.gnome.org/gobject/stable/gobject-Signals.html#signal-memory-management.


> I also do wonder if it would be better to redefine this signal so it only has a 'gpointer user_data' parameter as the current_page could easily be obtained with gnc_main_window_get_current_page.

I don't think it would make a difference.

> I'm looking at adding signal_disconnect's, probably in the destroy_widget function as that is one of the first calls for destruction.

Since calling that is immediately followed by g_object_unref(page) in gnc_main_window_close_page that would work, but see next item:


>There is also a possible problem when a page moves to a new window which I am also looking at.

Yes, moving the page to a different window should disconnect it from the old window's page-changed signal and connect to the new window's signal. It looks to me that should be done in response to GncPluginPage's inserted and removed signals. gnc_main_window_close_page calls gnc_window_disconnect, which causes gnc_plugin_Page to emit the removed signal, before calling gnc_plugin_page_destroy_widget, so if you disconnect from page-changed when handling the removed signal you won't have to do it in destroy_widget or finalize.

A bit beyond scope for this bug, but it would be a better design to change destroy_widget to dispose. That way it would be run when the GncPluginPage's ref count went to 0 instead of having to be called explicitly.
Comment 23 Bob 2020-02-05 10:46:29 EST
Created PR 644 with my changes,
changed the callback setup to be from the page insert signal and the function definition to that of the signal spec. Also split the protection so if it does happen again the parameter involved should be easier to identify. I am recording the callback signal id so I can disconnect it from within gnc_main_window_cmd_window_move_page as early as possible so it does not get called when the pages start changing, there are also notebook signals happening,  and is used in the window disconnect to remove on close. Just to cover all my bases I have added it to the page destroy function but the signal should of been disconnected by then.
Comment 24 Bob 2020-02-13 09:41:04 EST
Mike, with the help of John I have pushed my changes to maint and so will be in the next release. If you have time can you try it and report back.
Comment 25 Mike Alexander 2020-02-13 17:57:42 EST
I built a version with PR 644 a week or so ago and have been using it without any problems.  It's a little hard to be sure, but this does seem to have fixed the focus problem.
Comment 26 Bob 2020-02-17 08:26:58 EST
Closing as PR seems to of fixed problem.

Note You need to log in before you can comment on or make changes to this bug.