GnuCash
Contact   Instructions
Bug 795753 - Initial save into MySQL should use bulk inserts
Summary: Initial save into MySQL should use bulk inserts
Status: NEW
Alias: None
Product: GnuCash
Classification: Unclassified
Component: Backend - SQL (show other bugs)
Version: 2.7.x
Hardware: Other Linux
: Normal normal
Target Milestone: future
Assignee: core
QA Contact: core
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-02 09:22 EDT by Jonathan Kamens
Modified: 2018-05-04 13:13 EDT (History)
5 users (show)

See Also:


Attachments

Description Jonathan Kamens 2018-05-02 09:22:39 EDT
I just told GnuCash to save my data into a new MySQL database over the network. It was previously in a 32MB XML file. I've been staring at it waiting for it to finish the save for 20 minutes now.

When I had previously saved into a local MySQL database, the total save time was well under a minute.

The problem here (well, one of the problems) is that GnuCash is using individual INSERT statements rather than bulk inserts. The save over the network could be orders of magnitude faster if GnuCash were inserting multiple rows per statement.

(Also, I hope you are turning off autocommit during the bulk load, so that the server doesn't flush logs to disk between every statement.)
Comment 1 Jonathan Kamens 2018-05-02 09:56:00 EDT
Actually, this is worse than I thought. After saving the data into the MySQL database on the network, which took over a half hour, I immediately exited from gnucash and then restarted it to see how long it would take to load the data.

I've now been waiting for over 13 minutes.

So it would seem that in addition to saving data over the network extremely slowly, GnuCash also loads data over the network extremely slowly. Loading the same data from a local MySQL instance takes well under a minute.

So, basically, whatever GnuCash is doing to save to and load from MySQL is completely unusable with a MySQL server on the network.
Comment 2 John Ralls 2018-05-03 13:39:33 EDT
Are the 1 minute save and load times on the local machine to an XML file or to a local MySQL server? Is the 32MB XML size compressed? If so, what's the uncompressed size?

Please test with 3.1: Bug 792195 and bug 794482 reported performance problems with the SQL backend and were fixed in that latest release.
Comment 3 Jonathan Kamens 2018-05-04 10:31:39 EDT
>Are the 1 minute save and load times on the local machine to an XML file or to a local MySQL server?

I answered this question already:

"When I had previously saved **into a local MySQL database**, the total save time was well under a minute."

>Is the 32MB XML size compressed? If so, what's the uncompressed size?

Uncompressed.

>Please test with 3.1

Time to save to a local MySQL database: 17 seconds.
Time to load from a local MySQL database: 5 seconds.

Time to save to a network MySQL database: I gave up after 12 minutes.

>Bug 792195 and bug 794482 reported performance problems with the SQL backend and were fixed in that latest release.

Bug 792195 doesn't seem to have anything to do with SQL or for that matter about Gnucash performance. Perhaps you were referring to bug 792105 (for which, see below).

Bug 794482 doesn't say anything at all about what was wrong, but it links to bug 792105, which refers to performance problems on Windows. I'm on Linux. It says that at the top of this bug report.

You're wasting my time here. You asked me questions I already answered, suggested that I might be encountering issues that aren't relevant to the platform that I'm using, and asked me to spend more time testing something with no legitimate reason to believe that the issue I reported was fixed.

Not to mention that I **explained in my bug report** what it is that is causing the performance issue, and you could have confirmed that yourself without further involvement from me.

Speaking as someone who has maintained and continues to maintain many open source projects, I would never treat someone who reported a bug this way.

How about some respect for the people who are trying to make your software better by taking the time to report legitimate issues about it?
Comment 4 Geert Janssens 2018-05-04 13:13:02 EDT
Time to cool down. The fact you're maintaining several open source projects doesn't give you the right to bully uss around.

We're trying to be helpful as much as we can. But at the same time there's much to deal with at once so please show a little patience for us as well.
Comment 5 Jonathan Kamens 2018-05-04 13:15:38 EDT
Pointing out that one has been mistreated is not bullying.

But whatever, you do you.

Out.
Comment 6 John Ralls 2018-05-04 14:26:48 EDT
> Not to mention that I **explained in my bug report** what it is that is causing the 
> performance issue, and you could have confirmed that yourself without further 
> involvement from me.

Which was:
> The problem here (well, one of the problems) is that GnuCash is using individual 
> INSERT statements rather than bulk inserts. The save over the network could be orders 
> of magnitude faster if GnuCash were inserting multiple rows per statement.
>
> (Also, I hope you are turning off autocommit during the bulk load, so that the server 
> doesn't flush logs to disk between every statement.)

Neither of which is going to affect a network-connected server and not a local one. That's not saying that those aren't good recommendations, they're just not likely to be relevant to your problem, which on the face of it would seem to be more likely to be either a server or network configuration problem, not something to do with GnuCash.
Comment 7 Jonathan Kamens 2018-05-04 15:07:50 EDT
The reason why I put the autocommit comment in parentheses is because I agree that it wouldn't help this particular problem; it was more a generic observation about performance when bulk-loading a file into the database.

However, I'm flummoxed by your assertion that bulk inserts would not make a difference with a network-connected server vs. a local one. When you do a huge number of transactions with a network-connected server, every transaction adds network round-trip time. When the server is local, the network round-trip time is essentially zero.

Here's how long it takes to do 10,000 inserts into a network-connected database separately vs. in a single insert statement:

Separate inserts: 122.62 seconds
Bulk inserts: 0.25 seconds (0.2%)

Here's how long it takes against _the same database_ when the script is run locally on the server the database is on:

Separate inserts: 8.29 seconds
Bulk inserts: 0.43 seconds (5.2%)

The "Separate inserts" time varies by a few seconds in either direction and the "Bulk inserts" time varies between around 0 and 1 seconds, but that doesn't affect the orders of magnitude involved or change the obvious conclusion that separate inserts are much slower on a network-connected database whereas bulk inserts are, comparatively speaking, really freaking fast whether local or network-connected.

Consider that there are over 44,000 <trn:split> objects in my XML file. Given the numbers above, if those are inserted one by one into a remote database, then inserting just those objects, ignoring all the other XML objects in the file, will take almost ten minutes. If they were all inserted in a single bulk statement or even a few bulk statements, the inserts would take a few seconds.
Comment 8 Jonathan Kamens 2018-05-04 15:09:26 EDT
You're welcome to check my work. Here's the script I used to produce the numbers above:

```
#!/usr/bin/env perl

use DBI;
use Time::HiRes 'time';

$hostname = shift @ARGV;
$dbname = shift @ARGV;
$username = shift @ARGV;
$password = shift @ARGV;

$iterations = 10000;

$dsn = "DBI:mysql:database=$dbname;host=$hostname";
$dbh = DBI->connect($dsn, $username, $password) or die;

$dbh->do("CREATE TEMPORARY TABLE test1 (value FLOAT);") or die;

$start = time();

for (1..$iterations) {
    $dbh->do("INSERT INTO test1 (value) VALUES (" . rand() . ");") or die;
}

$end = time();

$separate_time = $end - $start;

printf("Separate inserts: %.2f seconds\n", $separate_time);

$dbh->do("DROP TABLE test1;") or die;

$dbh->do("CREATE TEMPORARY TABLE test1 (value FLOAT);") or die;

$statement = "INSERT INTO test1 (value) VALUES ";
for (1..$iterations) {
    $statement .= "(" . rand() . "),";
}
$statement =~ s/.$/;/;

$start = time();

$dbh->do($statement) or die;

$end = time();

$bulk_time = $end - $start;

printf("Bulk inserts: %.2f seconds (%.1f%%)\n", $bulk_time,
       100 * $bulk_time / $separate_time);
```
Comment 9 John Ralls 2018-05-08 15:04:07 EDT
OK, I see your point. I didn't get quite the same differences but I do see variation ranging from 1.25x (i.e. 5 sec vs. 4 sec) to 12x (46 sec vs 4 sec) and that the ping time between the various machines is proportional: 300 microseconds on the fast case (two VMs on the same machine) to the longest of about 3.5 milliseconds.

Interestingly I had to test with two databases: My first trials were with my household server, an old Mac Mini running Debian 8 and MySQL 5.5 on which the local time was ~419 seconds and over the network (from the laptop) took ~480 seconds. The times above were using a Debian 9 VM on a Mac Pro.

It's significant, I think, that even on the slower server the single-query time is around 170 ms, showing that running too many queries results in a pretty serious performance hit on MySQL.

Note You need to log in before you can comment on or make changes to this bug.