GnuCash
Contact   Instructions
Bug 797750 - SIGSEV in swig-engine.c
Summary: SIGSEV in swig-engine.c
Status: RESOLVED FIXED
Alias: None
Product: GnuCash
Classification: Unclassified
Component: Reports (show other bugs)
Version: 3.10
Hardware: PC Linux
: Normal critical
Target Milestone: ---
Assignee: reports
QA Contact: reports
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-12 18:00 EDT by Christian G.
Modified: 2020-05-15 15:38 EDT (History)
5 users (show)

See Also:


Attachments

Description Christian G. 2020-05-12 18:00:08 EDT
I had a crash today due to a SIGSEV in swig-engine.c, when I tried to open the "Tax Report & XML Export".

This is the stack trace:

    _wrap_gnc_mktime() at swig-engine.c:17.338 0x7ffff796c84f   
    0x7ffff7de7dc7   
    scm_call_n() at 0x7ffff7dedb71   
    scm_call_3() at 0x7ffff7d6cb93   
    0x7ffff7de7dc7   
    scm_call_n() at 0x7ffff7dedb71   
    scm_call_1() at 0x7ffff7d6cb2c   
    gfec_eval_string() at gfec.c:74 0x7ffff7a82578   
    gnc_run_report() at gnc-report.c:163 0x7ffff6a4f0e8   
    gnc_run_report_id_string() at gnc-report.c:189 0x7ffff6a4f217   
    gnc_html_report_stream_cb() at window-report.c:273 0x7ffff6a71a7c   
    load_to_stream() at gnc-html-webkit2.c:503 0x7ffff69f874e   
    impl_webkit_show_url() at gnc-html-webkit2.c:895 0x7ffff69f98af   
    gnc_html_show_url() at gnc-html.c:370 0x7ffff69f587d   
    gnc_plugin_page_report_load_uri() at gnc-plugin-page-report.c:369 0x7ffff6a6b19c   
    g_main_context_dispatch() at 0x7ffff7c5578e   
    0x7ffff7c55b40   
    g_main_loop_run() at 0x7ffff7c55e33   
    gtk_main() at 0x7ffff715dc2d   
    gnc_ui_start_event_loop() at gnc-gnome-utils.c:671 0x7ffff7b4b083   
    inner_main() at gnucash-bin.c:680 0x55555555b950   
    0x7ffff7d851c1   
    0x7ffff7d66cfe   
    0x7ffff7de7dc7   
    scm_call_n() at 0x7ffff7dedb71   
    0x7ffff7ddb95d   
    0x7ffff7d672f9   
    scm_c_with_continuation_barrier() at 0x7ffff7d67399   
    0x7ffff7dda472   
    GC_call_with_stack_base() at 0x7ffff61f6ca5   
    scm_with_guile() at 0x7ffff7dda83c   
    scm_boot_guile() at 0x7ffff7d85376   
    main() at gnucash-bin.c:934 0x55555555be88   

It's the following code line in function _wrap_gnc_mktime():

t1.tm_sec = scm_to_int(SCM_SIMPLE_VECTOR_REF(tm, 0));

I have no idea, how to debug this. I'm not familiar with guile.

It could be a memory alignment problem. The pointer causing the SIGSEV is pointing at address 0x17fb95fc6, which is tm + 0x8 (tm is 0x17fb95fbe).

I'm working with GnuCash 3.10 (commit 4b8649f).
Comment 1 John Ralls 2020-05-12 19:04:41 EDT
Don't worry about Guile yet. Does 0x17fb95fc6 point to a reasonable place, or was that the address that segfaulted?
Comment 2 Christian G. 2020-05-13 16:29:17 EDT
0x17fb95fc6 was the address, that caused the segmentation fault. I reproduced the crash and this address doesn't seem to be in a valid memory range according to the memory viewer.

The exact assembly code line is

mov (%rax),%rax

with rax = 0x17fb95fc6.

And this address results from invalid tm == s_0.
Comment 3 John Ralls 2020-05-13 21:07:32 EDT
(In reply to Christian G. from comment #2)
> 0x17fb95fc6 was the address, that caused the segmentation fault. I
> reproduced the crash and this address doesn't seem to be in a valid memory
> range according to the memory viewer.

Yes, that's what segfault means, attempt to access an illegal memory location.

> And this address results from invalid tm == s_0.

What do you mean? tm is initialized to s_0 two lines before. The problem would seem to be that Guile is passing an invalid datevector to gnc-mktime. 

Assuming that it's the  German TXF report, this is the line:
https://github.com/Gnucash/gnucash/blame/maint/gnucash/report/locale-specific/us/taxtxf-de_DE.scm#L102

It should be
  (define tax-day (gnc-mktime (bdtm)))
because bdtm is a function, defined just above. You should be able to find that function in the installed location, fix it, and test.
Comment 4 Christopher Lam 2020-05-13 23:09:29 EDT
bdtm isn't a function... it is a gnc_localtime object.

It would be useful to sprinkle (pk ...) all over the .scm to detect where the segfault happens.

e.g. 

(define tax-day (gnc-mktime (pk "bdtm=" bdtm)))

will dump bdtm to stdout before passing its value to gnc_mktime.
Comment 5 Christian G. 2020-05-14 18:09:53 EDT
(In reply to John Ralls from comment #3)
> Yes, that's what segfault means, attempt to access an illegal memory
> location.
A segfault can mean both access to illegal memory or misaligned access to any memory. My first thought was misaligned access, but it's access to illegal memory.
 
> > And this address results from invalid tm == s_0.
> 
> What do you mean? tm is initialized to s_0 two lines before. The problem
> would seem to be that Guile is passing an invalid datevector to gnc-mktime.
I meant that this address results from invalid tm, which is set to invalid s_0. So invalid datevector s_0 results in an invalid address.

> Assuming that it's the  German TXF report, this is the line:
> https://github.com/Gnucash/gnucash/blame/maint/gnucash/report/locale-
> specific/us/taxtxf-de_DE.scm#L102
Yes it's German TXF report. But it doesn't seem to be the code line, which causes the segfault. When I add (pk ...) to dump bdtm, the output is already shown at startup of GnuCash and not, when I try to open the tax report.

> 
> It should be
>   (define tax-day (gnc-mktime (bdtm)))
> because bdtm is a function, defined just above. You should be able to find
> that function in the installed location, fix it, and test.
I found the function, but as I already wrote, I have absolutely no idea of Guile. I don't know how to fix it.

Instead I tried to locate the error source in the Git history but didn't succeed. I checked current maint as well as tagged versions 3.9 and 3.8. And I always cleaned and rebuilt the Guile cache. I always get the same crash.

Can anybody of the German developers try to reproduce this crash? It might be important, if it's a local problem at my computer or not?
Comment 6 John Ralls 2020-05-14 18:31:25 EDT
We've only got one German dev nowadays, and I just added him to the CC as a ping.

FWIW I did just test on MacOS with the US TXF report, which makes exactly the same call, and it didn't crash.

With chris's pk mod, it reports 
  ;;; ("bdtm=" #(0 0 12 16 3 120 4 134 -1 -25200 "Unset"))

And yes, it does so at startup, it's called by define-report which runs at startup. Since that's not when it crashes that probably means that the stack has gotten corrupted and that's not really the source of the crash.

Try installing Guile's debug symbols. Maybe with that you can set a breakpoint a couple of frames back and single-step to the crash to find where it's really coming from.
Comment 7 Frank H. Ellenberger 2020-05-14 19:31:53 EDT
I am able to reproduce a crash in current maint with LANG=de_DE.utf-8 while opening the tax report.
In the tracefile, but probably unrelated, I have:
* 01:19:22 ERROR <gnc.scm> report.scm error: Ein Bericht hat eine Identifikationsnummer (»report-guid«), die doppelt auftritt. Bitte prüfen Sie, ob folgende »report-guid« fälschlicherweise in den gespeicherten Berichten mehr als ein Mal auftritt: b64b8cbaa633472c93ab7d9a2424d157
Comment 8 Christopher Lam 2020-05-14 21:03:34 EDT
Try moving the whole

---
(define bdtm
  (let ((result (gnc-localtime today)))
    (set-tm:mday result 16)             ; 16
    (set-tm:mon result 3)               ; Apr
    (set-tm:isdst result -1)
    result))

(define tax-day (gnc-mktime bdtm))

(define after-tax-day (< tax-day today))
---

into the options generator:

---
(define (tax-options-generator)
  (define options (gnc:new-options))
  (define (gnc:register-tax-option new-option)
    (gnc:register-option options new-option))

(define bdtm
  (let ((result (gnc-localtime today)))
    (set-tm:mday result 16)             ; 16
    (set-tm:mon result 3)               ; Apr
    (set-tm:isdst result -1)
    result))

(define tax-day (gnc-mktime bdtm))

(define after-tax-day (< tax-day today))

  ;; date at which to report 
  (gnc:options-add-date-interval!
   options gnc:pagename-general 
   (N_ "From") (N_ "To") "a")

  (gnc:register-tax-option
   (gnc:make-multichoice-option
    gnc:pagename-general (N_ "Alternate Period"

....etc
-------
Comment 9 Christopher Lam 2020-05-14 21:05:34 EDT
Maybe not. the TXF reports are special level of ugliness and I'll reluctantly try fix them tonight.
Comment 10 John Ralls 2020-05-14 21:15:11 EDT
I don't think that the crash is really in wrap_gnc_mktime. I can also reproduce it now and while I do get the same crash result, the string being evaluated in frame 5 is gnc:report-run. Here's a stack trace with symbols:

(lldb) bt 16
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x17fbb5a06)
  * frame #0: 0x0000000103f9713a libgncmod-engine.dylib`_wrap_gnc_mktime(s_0=0x000000017fbb59fe) at swig-engine.c:17338:28
    frame #1: 0x00000001002ffa14 libguile-2.2.1.dylib`scm_apply_subr(sp=0x000000011c9fdc38, nslots=2) at gsubr.c:305:14
    frame #2: 0x000000010039c9e2 libguile-2.2.1.dylib`vm_regular_engine(thread=0x000000011cbc9e60, vp=0x000000011ceeef30, registers=0x00007ffeefbfa280, resume=0) at vm-engine.c:786:13
    frame #3: 0x000000010039a5d4 libguile-2.2.1.dylib`scm_call_n(proc=0x000000011d29f4a8, argv=0x00007ffeefbfa360, nargs=3) at vm.c:1260:11
    frame #4: 0x00000001002e2463 libguile-2.2.1.dylib`scm_call_3(proc=0x000000011d29f4a8, arg1=0x000000011daa45c0, arg2=0x000000011d1ce0e0, arg3=0x000000011d712b40) at eval.c:499:10
    frame #5: 0x000000010038d7c4 libguile-2.2.1.dylib`scm_eval_string_in_module(string=0x000000011daa45c0, module=0x000000011d712b40) at strports.c:382:10
    frame #6: 0x00000001002ffa3c libguile-2.2.1.dylib`scm_apply_subr(sp=0x000000011c9fdf00, nslots=3) at gsubr.c:307:14
    frame #7: 0x000000010039c9e2 libguile-2.2.1.dylib`vm_regular_engine(thread=0x000000011cbc9e60, vp=0x000000011ceeef30, registers=0x00007ffeefbfb610, resume=0) at vm-engine.c:786:13
    frame #8: 0x000000010039a5d4 libguile-2.2.1.dylib`scm_call_n(proc=0x000000011eb2a070, argv=0x00007ffeefbfb6c0, nargs=1) at vm.c:1260:11
    frame #9: 0x00000001002e2382 libguile-2.2.1.dylib`scm_call_1(proc=0x000000011eb2a070, arg1=0x000000011daa45c0) at eval.c:485:10
    frame #10: 0x0000000103c1bc46 libgncmod-app-utils.dylib`gfec_eval_string(str="(gnc:report-run 0)", error_handler=(libgncmod-report-system.dylib`error_handler at gnc-report.c:149)) at gfec.c:74:23
    frame #11: 0x00000001039e6b6a libgncmod-report-system.dylib`gnc_run_report(report_id=0, data=0x00007ffeefbfb8f8) at gnc-report.c:163:16
    frame #12: 0x00000001039e6d50 libgncmod-report-system.dylib`gnc_run_report_id_string(id_string="id=0", data=0x00007ffeefbfb8f8) at gnc-report.c:189:12
    frame #13: 0x0000000101a41d71 libgncmod-report-gnome.dylib`gnc_html_report_stream_cb(location="id=0", data=0x00007ffeefbfb8f8, len=0x00007ffeefbfb8f4) at window-report.c:273:10
    frame #14: 0x0000000101a6bb89 libgncmod-html.dylib`load_to_stream(self=0x000000010a017320, type="report", location="id=0", label=0x0000000000000000) at gnc-html-webkit1.c:488:27
    frame #15: 0x0000000101a6a122 libgncmod-html.dylib`impl_webkit_show_url(self=0x000000010a017320, type="report", location="id=0", label=0x0000000000000000, new_window_hint=0) at gnc-html-webkit1.c:924:13
Comment 11 Christopher Lam 2020-05-14 22:48:38 EDT
Jralls let me know how to reproduce crash, ideally on Linux?
Comment 12 Frank H. Ellenberger 2020-05-14 23:43:54 EDT
Chris call
LANG=de_DE.utf-8 gnucash
Reports->Steuer-Bericht & ElStEr-Export

The funny thing: if I add also LANGUAGE=C I get no crash. If I then restart with LANG=de_DE.utf-8 THe US report is still open, but useless for me.
Comment 13 John Ralls 2020-05-15 00:19:49 EDT
(In reply to Frank H. Ellenberger from comment #12)
> Chris call
> LANG=de_DE.utf-8 gnucash
> Reports->Steuer-Bericht & ElStEr-Export
> 
> The funny thing: if I add also LANGUAGE=C I get no crash. If I then restart
> with LANG=de_DE.utf-8 THe US report is still open, but useless for me.

That's interesting. I set LANG=de_DE to get the German TXF report--and the crash--but left LANGUAGE unset so that I could navigate more quickly.
Comment 14 John Ralls 2020-05-15 01:29:32 EDT
I did a bit more debugging. I set a breakpoint on scm_eval_string_in_model and continued through it until I got to the gnc:report-run, then started stepping. It took me a few minutes of stepping through loops in vm.c to realize that I was running the compiled scheme for the report and wasn't going to see anything useful, so I set a break on _wrap_gnc_mktime. The first 3 calls were fine, then I got to the bad address, established by trying to read its contents instead of letting it crash, so I still have an intact stack.

Chris, can you find out how to print out the Scheme stack from here?
Comment 15 Christopher Lam 2020-05-15 05:39:40 EDT
fell unfortunately you'll have to attach a .gnucash file for me.
Comment 16 Christopher Lam 2020-05-15 05:56:48 EDT
I can't load the de_DE report.

Try wrapping all (gnc-mktime xxxx) into (gnc-mktime (pk 'alpha xxxx)) to dump the xxxx in the various calls. It'll reveal which gnc-mktime is barfing.

See the gnc-mktime in taxtxf-de_DE.scm#L566

                          (case alt-period
                            ((1st-est 1st-last) ; Mar 31
                             (set-tm:mon bdtm 2))
                            ((2nd-est 2nd-last) ; May 31
                             (set-tm:mon bdtm 4))
                            ((3rd-est 3rd-last) ; Aug 31
                             (set-tm:mon bdtm 7))
                            ((4th-est 4th-last last-year) ; Dec 31
                             (set-tm:mon bdtm 11)) 
                            (else (set! bdtm (gnc-mktime to-value))))

All the previous cases would modify bdtm with (set-tm:mon bdtm ...) whereas the else clause would call (gnc_mktime to-value). This seems so much wrong. I cannot understand their logic at all. I don't use the TXF reports.
Comment 17 Christopher Lam 2020-05-15 05:59:25 EDT
(In reply to John Ralls from comment #14)
> 
> Chris, can you find out how to print out the Scheme stack from here?

Not sure. c-interface.scm will print the stack happily iff guile catches the crash.
Comment 18 John Ralls 2020-05-15 15:38:52 EDT
The answer I was looking for is scm_backtrace(), see https://www.gnu.org/software/guile/manual/html_node/Pre_002dUnwind-Debugging.html.

The crashing backtrace finishes with
In report.scm:
   780:24  3 (_)
   756:25  2 (gnc:report-render-html #<<report> type: 758b125c05e54…> …)
In taxtxf-de_DE.scm:
   580:40  1 (generate-tax-or-txf "Taxable Income / Deductible Expe…" …)
In unknown file:
           0 (gnc-mktime 1609487999)

The code at fault:
541         (to-value (gnc:time64-end-day-time
542                    (let ((bdtm from-date))
543                      (if (member alt-period 
544                                  '(last-year 1st-last 2nd-last
545                                              3rd-last 4th-last))
546                          (set-tm:year bdtm (- (tm:year bdtm) 1)))
...
579                            (else 
580                             (set! bdtm (gnc-mktime to-value)))))

gnc-mktime takes a struct tm and returns a time64, but this is passing a time64 and expecting a struct tm. 

The commit that made the bug:
https://github.com/Gnucash/gnucash/commit/fada13e456076e44a7b83c823f42fd3a913fc7ea
oddly the same code in taxtxf.scm (line 2081) correctly uses gnc-localtime. It was changed in the same commit.

A nice illustration of why weakly-typed languages like Scheme aren't really suitable for large-scale development.

Fix pushed to maint.

Note You need to log in before you can comment on or make changes to this bug.