Summary: If user attempts to save changes to the database, followed by loading an existing report tab where the report is required to render before the save has completed, Gnucash permanently hangs with no output in trace file. Steps to Reproduce: 1. Create a report in a tab (tested with Cash Flow, Income/Expense Bar, Portfolio Pie) 2. Move default focus on to any register 3. Save and close the file 4. Open the file 5. Create a transaction in the register 6. Click 'Save' 7. Click to focus on the report tab Expected Results: File saves and report renders. Actual Results: Report fails to render, transaction fails to save, UI provides colorful beach ball spinner, trace file receives no warnings or errors. Technical details: MBP 2012 - 8 GB RAM MacOS Catalina 10.15.7 (latest available for the hardware) Gnucash Version: 4.6 Build ID: 4.6+(2021-06-26) Finance::Quote: 1.47
It appears to be a deadlock. The main thread is waiting for the save's xml-thread to continue; the xml-thread is stuck in a read of its input file descriptor: * thread #32, name = 'xml_thread' frame #0: 0x00007fff203fccce libsystem_kernel.dylib`read + 10 libsystem_kernel.dylib`read: -> 0x7fff203fccce <+10>: jae 0x7fff203fccd8 ; <+20> 0x7fff203fccd0 <+12>: movq %rax, %rdi 0x7fff203fccd3 <+15>: jmp 0x7fff203fd6bd ; cerror 0x7fff203fccd8 <+20>: retq (lldb) up frame #1: 0x0000000101b37e01 libgnc-backend-xml-utils.dylib`gz_thread_func(params=0x0000000125d0ed60) at io-gncxml-v2.cpp:1440:21 [opt] 1437 { 1438 while (success) 1439 { -> 1440 bytes = read (params->fd, buffer, BUFLEN); 1441 if (bytes > 0) 1442 { 1443 if (gzwrite (file, buffer, bytes) <= 0) ( There are a bunch of WebKit threads sitting in g_main_loop waiting for an event to happen, that shouldn't be an issue. And there's a guile garbage collection thread also stuck in a read: (lldb) thr sel 21 * thread #21 frame #0: 0x00007fff203fccce libsystem_kernel.dylib`read + 10 libsystem_kernel.dylib`read: -> 0x7fff203fccce <+10>: jae 0x7fff203fccd8 ; <+20> 0x7fff203fccd0 <+12>: movq %rax, %rdi 0x7fff203fccd3 <+15>: jmp 0x7fff203fd6bd ; cerror 0x7fff203fccd8 <+20>: retq (lldb) up frame #1: 0x0000000100379a8c libguile-2.2.1.dylib`read_finalization_pipe_data + 28 libguile-2.2.1.dylib`read_finalization_pipe_data: -> 0x100379a8c <+28>: movq %rax, 0x8(%rbx) 0x100379a90 <+32>: callq 0x10042b41a ; symbol stub for: __error 0x100379a95 <+37>: movl (%rax), %eax 0x100379a97 <+39>: movl %eax, 0x10(%rbx) Mike Alexander, any ideas?
More testing with some strategic printfs in gz_thread_func. It's not actually blocking on the gc as I thought. Running the report takes over for a bit and ships off the report to the GtkWebKitWebView and the main thread resumes feeding bytes down the pipe which gz_thread_func dutifully sends off to gzwrite. With my test file the last chunk read off of the pipe is 64 bytes. If no report has run the next read gets 0 bytes terminates the thread, but if a report is run then gz_thread_func's read just sits waiting for more bytes.
Found it, with some help from Simon Arlott. WebKit forks & execs as well as firing off threads and the fork was getting a copy of the pipe's file descriptors keeping the pipe from closing. Applying FD_CLOEXEC on them causes the fork's copies to immediately close so the pipe can when the save completes.