GnuCash
Contact   Instructions
Bug 798298 - Re-imported transactions no longer ignored
Summary: Re-imported transactions no longer ignored
Status: RESOLVED FIXED
Alias: None
Product: GnuCash
Classification: Unclassified
Component: Import - OFX (show other bugs)
Version: git-maint
Hardware: PC Linux
: Normal normal
Target Milestone: ---
Assignee: import
QA Contact: import
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-27 23:40 EDT by Benjamin Gordon
Modified: 2021-09-06 23:16 EDT (History)
8 users (show)

See Also:


Attachments

Description Benjamin Gordon 2021-08-27 23:40:24 EDT
Previously, I could repeatedly re-import the same ofx or qfx file (or more commonly, a fresh file generated with a partially overlapping date range).  This was fine because transactions that had already been matched were ignored, so I only had to look at new and skipped transactions.

Now re-importing the same file causes all of the previously matched transactions to show up again.  Nearly all of them just show up as a match for an existing transaction, but it's still difficult to try to figure out which transactions are actually new.  Some banks default to downloading the last 12 or 18 months of transactions, which can leave me double-checking hundreds of matches.

I did a git bisect and found that this behavior started with 3f1e24991f7e78bed7a96be6929e5b6c0bda86e (Bug 798205 Do not exclude from import a transaction that has an FITID).  That bug seems to be about the opposite behavior; is this the new intended behavior or a mistake in the other fix?
Comment 1 Chris Good 2021-08-28 02:25:27 EDT
Hi Benjamin,
This is an intended change as part of the fix for Bug 798205.
We cannot silently drop transactions from import because they match an existing transaction with the same FITID, because it could be a coincidence that they have the same FITID. Therefore we have to add the new transactions into the matcher window, and they should show as already matched so do not need any action.
Are you sure your bank doesn't have an option to select transactions by date range? My banks do, although it is a little difficult to find.
Failing that, you can easily edit the .ofx file to remove transactions for dates already imported.
Another easy option would be to use my utility https://github.com/goodvibes2/IngAusOfxFixLinux. After you enter the input file, it reads it and shows the transaction 'from; and 'to' date range. You can modify the date range to restrict the transactions that are output.
Comment 2 Benjamin Gordon 2021-08-28 12:59:50 EDT
Thanks for the quick reply.

As you suggested, all of my banks have a way to set the date range to be downloaded.  But setting it explicitly requires an extra 5-10 clicks *per account* at most banks.  Not only that, but if I want to minimize the overlap, now I have to open each account in gnucash first to see when I last imported it.  Previously, I could just log into each bank, download whatever chunk of transactions was most convenient, and then import a bunch of files into gnucash all at once.  The new multi-import feature that was introduced in 4.0 made this workflow even easier. Given how much easier it is to repeatedly re-import compared to tracking dates, I suspect I'm not going to be the only person who has a workflow like this.

I get that some banks are generating duplicate FITID values and that was causing transactions to be silently dropped.  For those users, this change will help.  But for everybody else, this is a big regression.  Can we have an option somewhere to ignore previously imported transactions when importing?  For me, just putting a simple checkbox the settings would be sufficient.  To account for people who might have 1/N badly behaved banks, maybe it would be nicer to set it per-ORG or per-FID value?

If you want to point me in the right direction, I'd be happy to take a stab at adding a checkbox to the settings.
Comment 3 Chris Good 2021-08-28 20:33:33 EDT
Benjamin, I'm in favour of a new option. The mods for Bug 798205, apart from the removal of unused code, were very small - just the removal of 2 'if' statements.
Maybe your concern, before the mods even became live, will convince the devs to override their understandable reticence for adding new options?
Comment 4 Jean Laroche 2021-08-30 21:41:51 EDT
What if we didn't show transactions whose FITID already exist AND whose amount match that of the transaction that's already in the register?
I believe this would solve the problem without having to have an option:
- If you re-import the same ofx file, we won't show any transaction from the ofx file because the FITID are already in the register, with the same amounts as in the OFX.
- If your bank reuses an FITID, as long as the amount isn't the same, the transaction will be shown.

We could also check that the dates match, but I can think of cases where the dates wouldn't necessarily match (for example the date you entered manually does not match that of the OFX transaction, yet you matched them upon import and in that case I'm not sure whether the date is overwritten with that in the OFX transaction).
J
Comment 5 Jean Laroche 2021-08-30 21:42:41 EDT
As an aside, I am very much in the favor of hiding transactions that have been previously imported and matched. In my personal opinion, the new behavior is a bit of a regression.
Comment 6 Chris Good 2021-08-30 21:59:45 EDT
(In reply to Jean Laroche from comment #4)
> What if we didn't show transactions whose FITID already exist AND whose
> amount match that of the transaction that's already in the register?
> I believe this would solve the problem without having to have an option:
> - If you re-import the same ofx file, we won't show any transaction from the
> ofx file because the FITID are already in the register, with the same
> amounts as in the OFX.
> - If your bank reuses an FITID, as long as the amount isn't the same, the
> transaction will be shown.
> 

Jean, It could still be a coincidence (albeit unlikely) that the amounts match and IMHO dropping a transaction (almost) silently should never occur. You do get a count at the end of the import but it's not obvious what it means.

> We could also check that the dates match, but I can think of cases where the
> dates wouldn't necessarily match (for example the date you entered manually
> does not match that of the OFX transaction, yet you matched them upon import
> and in that case I'm not sure whether the date is overwritten with that in
> the OFX transaction).
> J

Agree date is not a good check - even if I transfer amounts between my accounts at the same bank, a transaction can appear in the 'to' account the day after it left the from account.
Comment 7 Chris Good 2021-08-30 22:09:29 EDT
(In reply to Jean Laroche from comment #5)
> As an aside, I am very much in the favor of hiding transactions that have
> been previously imported and matched. In my personal opinion, the new
> behavior is a bit of a regression.

Jean, I agree it is bit of regression, but it is not hard to limit the export to a unique date range. And when you suddenly see lots of transactions being matched, it is clear you have imported a duplicate date range and can cancel the import if you wish.

If we only checked for FITID on splits for the import account, then we could drop them, but then (I think?) transfers between accounts would not match.
Comment 8 Chris Good 2021-08-30 22:28:54 EDT
I was avoiding showing a list of dropped transactions because I thought that would be too hard, but maybe we could drop them, and show a list of them above the buttons in the dialog at the end that shows the counts?
Comment 9 Benjamin Gordon 2021-08-31 01:36:09 EDT
I think my original suggestion of a new global option is probably not ideal, because some people may have both a problem bank and other banks.

The suggestion from bug 798205 seems maybe doable, but not ideal to ask the user to set a new field on all their accounts.

However, what about a variation of comment 4: instead of checking that the fitid, amount, and date match the gnucash transaction, what about storing more info in the online_id field?  You could do something like v2:hash(date|amount|name|memo):fitid.  This would be similar to what IngAusOfxFixWin does with FITID, but applied to everything.  Keeping the fitid outside the hash makes it possible to still go search for records in the OFX file given an online_id field.  If the v2: tag isn't present, it could match against just the fitid like the current behavior.  That avoids requiring a field day where every single previous online_id suddenly doesn't match anything.

With this scheme, it would be ok if you record a different date or change other fields in the register because the re-import would recreate the same hash using the original value from the OFX file.  You could still get a false match if you had two transactions posted on the same day with all identical fields and your bank generated the same FITID for them, but at that point I don't think a human could tell them apart either.

Thoughts on whether that might be workable?
Comment 10 David Reiser 2021-08-31 16:49:10 EDT
(In reply to Chris Good from comment #1)
If the bank is following the OFX standard, then there will only ever be one transaction in any account with any FITID.

I do dozens of downloads a month. If I had to stop and figure out the exact date I downloaded last, it would add a tremendous amount of hassle to using gnucash.

I don't think requiring those of us who have banks that comply with the OFX spec should be subject to radically increased inconvenience in gnucash for the sake of people whose banks do not follow the spec.

This "fix" is a serious regression.
Comment 11 Jean Laroche 2021-09-01 01:08:28 EDT
(In reply to Benjamin Gordon from comment #9)
> I think my original suggestion of a new global option is probably not ideal,
> because some people may have both a problem bank and other banks.
> 
> The suggestion from bug 798205 seems maybe doable, but not ideal to ask the
> user to set a new field on all their accounts.
> 
> However, what about a variation of comment 4: instead of checking that the
> fitid, amount, and date match the gnucash transaction, what about storing
> more info in the online_id field?  You could do something like
> v2:hash(date|amount|name|memo):fitid.  This would be similar to what
> IngAusOfxFixWin does with FITID, but applied to everything.  Keeping the
> fitid outside the hash makes it possible to still go search for records in
> the OFX file given an online_id field.  If the v2: tag isn't present, it
> could match against just the fitid like the current behavior.  That avoids
> requiring a field day where every single previous online_id suddenly doesn't
> match anything.
> 
> With this scheme, it would be ok if you record a different date or change
> other fields in the register because the re-import would recreate the same
> hash using the original value from the OFX file.  You could still get a
> false match if you had two transactions posted on the same day with all
> identical fields and your bank generated the same FITID for them, but at
> that point I don't think a human could tell them apart either.
> 
> Thoughts on whether that might be workable?

I see, you're suggesting to keep the original data from the OFX, so if we re-import it will match for sure. A collision would require date+amount+FITID to all match, and you're right that in that case the transactions are indistinguishable.
I don't know the details but I think adding a field to the database is a benign operation. Not sure devs would agree that this is a good enough reason for it.
Jean
Comment 12 Chris Good 2021-09-01 01:37:35 EDT
Hi David Reiser,
Thanks for raising your concern - it's good to know this is increasing the work too much for some people, maybe we can come up with a better solution before the next release.

However, I think you are misrepresenting the facts.
The problem in Bug 798205 is not that the bank is not following the OFX spec,
but that GnuCash looks for FITID's in all splits for transactions that have a split for the account being imported, not just the splits for the account being imported. The OFX spec says:
"FITIDs must be unique within the scope of an account".
That's not terribly specific, but I assume they mean:
"FITIDs must be unique within an account".

In the transactions for Bug 798205, the FITID's *are* unique within the account.

Hi Benjamin Gordon,
Your Comment 9 suggestion sounds possible but overly complicated to me.

The problem from Bug 798205 was that some transactions were silently dropped.
My solution in Comment 8 overcomes that so we could go back to the previous FITID processing.
Comment 13 John Ralls 2021-09-01 11:42:17 EDT
> I don't know the details but I think adding a field to the database is a benign operation. Not sure devs would agree that this is a good enough reason for it.

It's more or less benign to add entries to the Key-Value Properties (aka KVP and slots). That's where the current online-id lives. There's no problem adding another one. Since augmenting the FITID in the online-id slot would confuse an older version of GnuCash I'd prefer that the hash go to a new slot. That's anyway clearer in code than parsing a delimited field. The hash might include the <DTPOSTED>, <AMOUNT>, and <NAME> or <PAYEE> tags.

I can see one potential issue with this, one that the current matching code occasionally has trouble with: Several transactions on the same day for the same amount to the same payee. I see this most often with Apple Music but there are no doubt other sources.

The XML parser isn't tolerant of any elements it doesn't know about so adding first-class fields to any data structure creates an immediate backward incompatibility. We only do that for major releases and haven't done it at all since 2.4.
Comment 14 Benoit Grégoire 2021-09-03 16:55:25 EDT
I am horribly busy currently, so this will be shorter than it sould be.  Do ask any clarification questions however, I'll try to answer them.

1)  This is a major regression.  I initially wrote libofx and ofx support in gnucash two decades ago to avoid exactly this 
2) I agree with everyone's interpretation that the file provided for bug 798205 does not violate the spec.
3) My understanding of the discussion in bug 798205 was that from now on we would only match the online id of the split in the account it's imported into (the source account) against the splits in that account.  I agree with this, so did not worry:
 a) it's now possible (online id were moved from accounts to splits a while ago)
 b) it would solve 798205
 c) it's essentially the behaviour suggested by the spec (prepending the account_id), just implicitly
 d) as we don't prepend the FITID in  a hardcoded fashion, we could still use matching online id's to increase the matching probability for banks that use the same id for transfers in the future (instead of silently matching in this case as was done in the past).  This would have to be written if justified by real world files (files that have matching online_ids for transfers, but do not use ofx transfers).
4) The problem is that looking at the code in pull https://github.com/Gnucash/gnucash/pull/1098 it does NOT seem to do the above.  It seems to drops support for split online_ids completely (except storing them)!  It may have just been a mistake (that code was admittedly horribly unclear).  But once again, supporting this was the reason I wrote the importer in the first place.

To reword, the patch should have been to consider the (entire) transaction already imported IF a split in the source account matches the online id of the transaction being imported.  NOT (previous behaviour), if any split matches the transaction being imported NOR (current behaviour subject of this bug) to completely ignore the online_id. 

I think the pull request should be reverted.

I hate doing this for outside contributions, I'm willing to properly review any further version of that patch as fast as possible, but unfortunately cannot work on this myself at present.
Comment 15 Jean Laroche 2021-09-03 20:30:56 EDT
OK I'm going to take a stab at this. I've been holding off because it wasn't very clear how to proceed, there are two different but related bugs.
Here's the plan:
- I'll revert the PR 1098.
- I'll modify the code so the matcher only considers online ids of splits that correspond to the account to which the OFX transaction is imported.

This should fix the issue of bug 798205 because the matching online id that belongs to another account will be ignored.

The issue related to transfers between two accounts appearing in a single ofx file is related but separate. I'll take a look at this later.

Let me know if you disagree with this plan.

J.
Comment 16 Christopher Lam 2021-09-03 21:09:34 EDT
There's bug 798304 as well if it's possible to take into account as well.
Comment 17 Chris Good 2021-09-04 04:14:07 EDT
Benoit,
Sounds good to me.

Jean,
We should ensure my mods for Bug 798205 PR 1098 are reverted before 4.7 is released 2021-09-26 (string freeze 2021-09-12).
This is maint commit
https://github.com/Gnucash/gnucash/commit/be6fb1abe2b7fac27c4aefc4b32415bd1c73ab92

I'll do that tomorrow if you like?
Comment 18 Jean Laroche 2021-09-04 13:15:40 EDT
I was going to do that this morning. We have a bit of time before the freeze.
Comment 19 Jean Laroche 2021-09-05 23:38:51 EDT
PR https://github.com/Gnucash/gnucash/pull/1129 should now address this bug. I reverted the changes that were made and add code to fix bug https://bugs.gnucash.org/show_bug.cgi?id=798205
Comment 20 John Ralls 2021-09-06 16:55:17 EDT
Jean's patch is merged.
Comment 21 Chris Good 2021-09-06 20:47:05 EDT
Note the issue with "transfers between two accounts appearing in a single ofx file" Jean mentions in Comment 15 is Bug 105334.
Comment 22 John Ralls 2021-09-06 23:16:17 EDT
Chris, I don't think Jean's change will help with that; it might even make it worse because the prior import with the same FITID to another account will no longer be recognized. Noticing that the transaction has already been recorded will fall back on the generic matcher noticing that there's already a transaction on the same date with the same description and amount. Unfortunately with modern micro-transactions like the ones generated by Apple Music that's no longer a reliable way to detect duplicates. The only safe recourse is to present all of the splits remaining after FITID processing to the user and ask them to sort it out.

Note You need to log in before you can comment on or make changes to this bug.