WIP: use args.grouping #200

leej3 · 2018-04-26T20:20:21Z

Fixes crawling/grouping mentioned by @yarikoptic in #198

yarikoptic · 2018-04-26T21:00:31Z

oh, so it wasn't passed through? thanks @leej3
ideally we should accompany with a dedicated test so things do not get broken again in RF
Note that it already breaks the tests (I guess changes current/default behavior)

leej3 · 2018-04-26T21:16:01Z

yep apologies. that was a quick hack because i needed to use it. i can't vouch for that being the desired behavior though!

…

On Thu, Apr 26, 2018 at 5:00 PM, Yaroslav Halchenko < ***@***.***> wrote: oh, so it wasn't passed through? thanks @leej3 <https://github.com/leej3> ideally we should accompany with a dedicated test so things do not get broken again in RF Note that it already breaks the tests (I guess changes current/default behavior) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#200 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFKsqN2yU_x34EYwPo5fpyY5PAMqobXAks5tsjVxgaJpZM4TnfPz> .

leej3 · 2018-05-18T17:57:25Z

Many scans I encounter do not pass the assertion that studyUID == file_studyUID so using accession_number for grouping is the only way that I can get this working.

I'm under the impression that this is still the default behaviour. "grouping" is None either way. It's only when one changes grouping to accession_number that it alters the behaviour.

I've rebased this onto master.

EDIT: I've since determined that all of these encounters are when I received to study's tarred up together. Perhaps not something that needs to be addressed. Although, if grouping is studyUID instead of None and the heuristics file is able to deal with a dict of seqinfo objects, heudiconv can still handle the tars.

leej3 · 2018-05-24T00:59:56Z

@yarikoptic perhaps i can get your help with this. I think what was causing confusion for me is that for

args = gen_heudiconv_args(datadir, outdir, subject, heuristic)
runner(args) # run conversion

in test_regression.py the args object is not getting passed to runner() as I expect it to be. args is not the same before and after parse_args(argv). Perhaps you could confirm this?

I changed the keyword argument to main() to be args to prevent this. I'm now passing the tests that I was previously failing. I'm now failing other tests though.

I think its because the object returned by group_dicoms_into_seqinfos changes depending on whether grouping is None or not. it alternates between a) a dict with seqinfo objects as keys and files as values and b) a dict of the aforementioned dicts. Is there a reason for this behaviour or could we make it uniform?

yarikoptic · 2018-05-24T02:31:01Z

Thanks for pushing this @leej3 . I will look at it with a fresh eye tomorrow morning.
Cheers

leej3 · 2018-05-24T03:05:23Z

Great thanks. I've pushed an example of turning everything to a the aforementioned (b) and then from there to (a), which is currently used downstream in a few places. Seems to be better at passing the tests.

codecov-io · 2018-05-24T18:09:58Z

Codecov Report

Merging #200 into master will decrease coverage by 0.09%.
The diff coverage is 81.25%.

@@            Coverage Diff            @@
##           master     #200     +/-   ##
=========================================
- Coverage   67.81%   67.72%   -0.1%     
=========================================
  Files          30       31      +1     
  Lines        2411     2429     +18     
=========================================
+ Hits         1635     1645     +10     
- Misses        776      784      +8

Impacted Files	Coverage Δ
heudiconv/convert.py	`78.2% <100%> (+1.07%)`	⬆️
tests/test_grouping.py	`100% <100%> (ø)`
heudiconv/parser.py	`87.2% <100%> (+0.46%)`	⬆️
heudiconv/cli/run.py	`74.05% <100%> (+0.16%)`	⬆️
heudiconv/dicoms.py	`56.48% <64%> (-3.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ff09b1...09e8850. Read the comment docs.

yarikoptic · 2018-05-25T17:08:15Z

heudiconv/cli/run.py

-    parser.add_argument('-g', '--grouping', default='studyUID',
-                        choices=('studyUID', 'accession_number'),
+    parser.add_argument('-g', '--grouping', default='None',
+                        choices=('studyUID', 'accession_number','None'),


@leej3 - what do you want to say/achieve when groupping is 'None'? do you want

to convert all provided DICOMs as a part of the single study as if their studyUID was the same? (then I would call it "all" or something like that)

to convert every sequence as it was a study on its own? (that is where None is making sense to me "semantically")

Also I would prefer to keep studyUID as the default for a number of reasons:

consistent behavior with previous versions

it is the most typical case AFAIK

Yes, default of studyUID sounds good. And agreed, 'all' might be more intuitive. Not sure what I would instinctively interpret 'None' to mean.

yarikoptic

ok, I think you wanted "to convert all provided DICOMs as a part of the single study as if their studyUID was the same?" Let me try to push/reverse(where needed) forward a bit and see where I get.

Re unified output of group_dicoms_into_seqinfos -- indeed would make sense... I guess I kept it returning original one, whenever no groupping, for compatibility or something like that. But indeed it should (but who knows) unify the code if it would be uniform. Let me see

yarikoptic · 2018-05-25T17:09:50Z

heudiconv/dicoms.py

+            print("Trying to construct Seqinfo object:",
+                  "{ob}".format(ob=SeqInfo.__doc__),
+                  "\n However the arguments supplied were:")
+            print([repr(v) for v in seqinfo_args])


use lgr.error not plain print.

Yes multiple accession_numbers/studyUIDs being converted as a single study. i.e. the scanner crashes and two "unique sessions" should be merged to prevent needless complication.

yep it seems easy enough to always return the uniform object. The only problem arises when you create filegroup_json. If you have duplicate series_ids it will fail which is undesirable for a large number of dicoms passed in that are trivially separable by the combination of series_id and grouping variable (or even just their seqinfo object). We could alter the filegroup json object to have an additionaly layer of nesting, fail with an assertion like i've done, or something more inventive perhaps

re: "scanner crashes and two "unique sessions" should be merged" -- if accession was the same, wouldn't current --groupping=accession_number be sufficient? (just trying to get a fuller picture)

re: "problem ... filegroup_json. If you have duplicate series_ids " ""unique sessions" should be merged" ... "undesirable for a large number of dicoms passed in". Let me ouline

I think all DICOMs should be passed in and no magical "merge" (or "substitute") should happen automagically

indeed it is a problem (and I think we guard for it) if duplicate series_id comes in since filegroup*.json contains a dict. We should make them "non-conflicting" (indeed question how and when?)

"by combination of series_id and grouping variable", "additionaly layer of nesting" -- too complex imho and then would not be uniform with already existing setup. May be we should just disambiguate by adding a "+1", "+2" or alike whenever conflict arises (and issue a warning). The question is how early such analysis should happen

still digging... will need to switch to other thing soon though... Question: do you have (or could you generate e.g. using nifti2dicom) any sample data you could possibly share to demonstrate your usecase?

@satra @mgxd -- I just want to make sure: for the usecase when dicom path template provided, should there be additional splitting into sessions/studies within each of those subject(/session) pairs by anything or all matching dicoms should just belong to that subject(/session)? what would happen if duplicates (canceled sessions, restarted)?

re: "wouldn't current --groupping=accession_number be sufficient", the accession_numbers are different too.

re: "(indeed question how and when?)", not sure I know enough about the different ways in which the pipeline can be run. For example an especially challenging condition is if two different tars, targeted to the same output directory, run at the same time. I guess when filegroup_file is written (save_json(filegroup_file, filegroup)) we could check if it already exists and append to it. This would require a check for duplication and modification of the seqinfo object series_ids too. May be a little bit too challenging. A simpler case to accommodate is the one I outlined where it is all contained in the same tar.

re: "sample data", I was given this data so I don't know for sure about how it was created. From what I can tell 2 study dicom tars were untarred and the two directories were used to create a new tar. So this process would reasonably replicate the data (provided they have some matching series_ids).

yarikoptic · 2018-05-25T17:15:17Z

heudiconv/cli/run.py

-        lgr.warning("Nothing to be done - displaying usage help")
-        parser.print_help()
-        sys.exit(1)
+    argv = sys.argv[1:]


not yet clear to me why we need to overload those and why it all should make a difference. args should be the args to be parsed by the parser, either they come from the tests or cmdline (via sys.argv). main shouldn't get the args as a structure to be overwritten later on etc

Ok I may be entirely wrong and the those changes are not required. Its all abit confusing when debugging the regression test test_conversion (I've modified this slightly, posted below). Perhaps it could be changed simply to avoid that. Here's what I observe:

Args exists as an object from pytest at the start of the test. Printing this object fails.

After "args = gen_heudiconv_args(datadir, outdir, subject, heuristic)" is executed print now works and outputs a list.

Stepping into runner(args), args is now:

argv = ['-d', '/tmp/pytest-of-root/QA/sourcedata/{subject}/*/*/*.tgz', '-c', 'dcm2niix', '-o', '/tmp/pytest-of-root/out', '-s', 'sub-sid000143', '-f', '/src/heudiconv/heudiconv/heuristics/reproin.py', '--bids']

Running parser.parse_args(argv) outputs the namespace object we want but running

args = parser.parse_args(argv)

Does not assign the result to args. Subsequent tests using its attributes are not really going to work for example bool( args.grouping) at this time will yield true.

Do you observe this?

test_conversion:

"""Testing conversion with conversion saved on datalad""" import json from glob import glob import pytest have_datalad = True try: from datalad import api # to pull and grab data from datalad.support.exceptions import IncompleteResultsError except ImportError: have_datalad = False import heudiconv from heudiconv.cli.run import main as runner # testing utilities from .utils import fetch_data, gen_heudiconv_args import py import os import shutil @pytest.mark.parametrize('tmpdir', [py._path.local.LocalPath('/tmp/pytest-of-root')]) @pytest.mark.parametrize('datadir', ['pytest-3']) @pytest.mark.parametrize('subject', ['sub-sid000143']) @pytest.mark.parametrize('heuristic', ['reproin.py']) @pytest.mark.skipif(not have_datalad, reason="no datalad") def test_conversion(tmpdir, subject, heuristic,datadir): tmpdir.chdir() if os.path.exists('out'): shutil.rmtree('out') if os.path.exists(datadir): outdir = os.path.join(tmpdir.strpath, 'out') else: try: datadir = fetch_data(tmpdir.strpath, subject) except IncompleteResultsError as exc: pytest.skip("Failed to fetch test data: %s" % str(exc)) outdir = tmpdir.mkdir('out').strpath import pdb; pdb.set_trace() args = gen_heudiconv_args(datadir, outdir, subject, heuristic) print(args) # import pdb;pdb.set_trace() runner(args) # run conversion # verify functionals were converted assert glob('{}/{}/func/*'.format(outdir, subject)) == \ glob('{}/{}/func/*'.format(datadir, subject)) # compare some json metadata json_ = '{}/task-rest_acq-24mm64sl1000tr32te600dyn_bold.json'.format orig, conv = (json.load(open(json_(datadir))), json.load(open(json_(outdir)))) keys = ['EchoTime', 'MagneticFieldStrength', 'Manufacturer', 'SliceTiming'] for key in keys: assert orig[key] == conv[key]

Ok part of the confusion is that args is a pdb command. that may solve it

naming it something other than other args would be helpful for any future pdb debugging sessions. I forgot args was a pdb command. But yes provided you aren't stepping through the code with pdb it all makes sense.

I didn't even know about args pdb command ;-)

this will fail if a tar contains multiple uids.

don't think this is required

* gh-leej3/grouping_fix: remove assertion about unique series_ids restore default grouping to None report grouping add grouping=all option use original group_map order make run order the same as the archive tidy print change back code causing nipype error provide some help upon SeqInfo object construction failure fix args parsing for running heudiconv from cmdline Conflicts: heudiconv/cli/run.py - took my version, TODO: rename args heudiconv/convert.py - OrderedDict, took leej3's version + my multilining heudiconv/dicoms.py - minor doc conflict

* origin/master: BF: for datalad 0.10.0 - check/populate .gitattributes correctly BF+ENH: set dist-restrictions metadata correctly

leej3 · 2018-06-21T06:14:55Z

Before this pull request the following usecases(implementation) were:

Template based handling
- groups can be specified by -s and -ss. The default grouping here was None (because args.grouping was not passed through). When -ss was not specified, tarball defines session.If multiple tarballs are globbed by the dicom template, currently a warning is logged if "bids"-session is explicitly specified (with -ss). This should just fail and will be a change in behaviour.
Files based handing
- By default grouping is studyUID.
- grouping is passed through to the "group_dicoms_into_seqinfos"
- default is a single session unless the heuristic changes this.
- any tarball grouping is ignored

Need to implement tests to confirm that this is the logic we are implementing. For this, automagic IO context manager for files used in tests. Also want to cache (done with fixture).

yarikoptic · 2018-06-26T14:16:56Z

Hi @leej3 ... so did -g accession_number --files work for you?

leej3 · 2018-06-26T14:53:23Z

Have been away from a computer. Should get a chance over the next couple of days. John

…

On Jun 26, 2018, at 22:17, Yaroslav Halchenko ***@***.***> wrote: Hi @leej3 ... so did -g accession_number --files work for you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

…_fix * commit '2ff09b1a1b77282c9a10a74799d280db1bc4f85b': rel: bump to next version rel: proper release day Fix links to heuristcs in README.md doc+enh: update docker dcm2niix, add entry to changelog doc: changelog + version bump fix: do not load phasediff until exists BF: workaround for elderly buggy python 2.7 exec

leej3 · 2018-07-09T14:11:27Z

@yarikoptic, just to sync back up:

I have a tar contain two different studies (2 studyUIDs and 2 unique accession numbers, my center does that last bit differently from yours). I would like for me to be able to pass this single tar to heudiconv and to get a single session as output. Currently this works given certain input parameters (I believe erroneously). I can pass --grouping accession_number for the template usage or --files usage and get back a single session. When using the template I explicitly define the session (-ss session_tag). I'm not sure if i'm using --files correctly but I just got whatever session i defined with infotoids in the heuristics file.

We talked about this previously but just to reiterate: moving this pull request forward shall we drop the template interface and focus on consolidating what is in --files (and I'll learn to use that better).

This would mean that one can use the --files flag and obtain the following behaviours:

For specifying subject: this could be specified with the -s argument, or by using infotoids
Specifying session:
- specify with a -ss flag? Not sure about this option. It would be a simple way to define the session for all globbed files. Would be particularly useful with grouping=all
- grouping=all : all globbed files are merged into a single session specified by the -ss flag, or an arbitrary first session name.
- grouping=tar (or file source): this behaviour would match with what Satra described for the template usage: glob some tars and each tar becomes a new session with the session number incrementing each time.
- grouping=studyUID: sessions are defined by the studyUID
  +grouping=accession_number
- grouping=override: not sure what to name this state but Michael Hanke described using of the -a flag to prevent heudiconv from altering the grouping and have it defined externally instead.

I've pushed a set of proposed tests for this behaviour. I wasn't sure how to use AutomagicIO. Could you help with that.

Do you think this is a useful way forward? I think if we can pass all these tests we don't remove any functionality that we know someone is using but consolidate the interface (by removing the template usage) to reduce the complexity of the code.

yarikoptic · 2018-07-12T01:47:18Z

thanks @leej3 ! I will try to get back to it in upcoming days. Also I still have some changes I was trying to meld into yours while we went through all that logic (or "confusion" to say more appropriately) analysis at OHBM

leej3 force-pushed the grouping_fix branch from c8db904 to fe7a4a0 Compare May 18, 2018 17:52

leej3 changed the title ~~use args.grouping~~ WIP: use args.grouping May 22, 2018

leej3 force-pushed the grouping_fix branch 2 times, most recently from 0660dd0 to 05de1d8 Compare May 23, 2018 22:55

leej3 added 2 commits May 23, 2018 20:24

use args.grouping

ebbda00

make attempt at fix by changing argv

600c6c5

leej3 force-pushed the grouping_fix branch from 05de1d8 to 600c6c5 Compare May 24, 2018 00:28

alter behavior for seqinfo generation

c2904c5

fix args parsing for running heudiconv from cmdline

404628c

leej3 added 3 commits May 24, 2018 15:29

provide some help upon SeqInfo object construction failure

3848f9a

change back code causing nipype error

008f62f

tidy print

3067985

yarikoptic reviewed May 25, 2018

View reviewed changes

leej3 and others added 9 commits June 7, 2018 19:12

make run order the same as the archive

1eaac58

use original group_map order

d56cdfc

add grouping=all option

f654de7

report grouping

91c11ca

restore default grouping to None

5f3758d

this will fail if a tar contains multiple uids.

remove assertion about unique series_ids

f260b1b

don't think this is required

RF+BF: incomplete pass over changes trying to harmonize etc

b13e6df

Merge remote-tracking branch 'origin/master' into grouping_fix

aea21b5

* origin/master: BF: for datalad 0.10.0 - check/populate .gitattributes correctly BF+ENH: set dist-restrictions metadata correctly

yarikoptic mentioned this pull request Jun 21, 2018

Wishlist #198

Open

8 tasks

fix seqinfo_dict key type for python 3

a14cb2f

leej3 added 2 commits July 8, 2018 11:39

fix another seqinfo instance check

5353c76

leej3 force-pushed the grouping_fix branch from 26b07ce to 3872c38 Compare July 9, 2018 14:12

add template for tests to confirm grouping behaviour

09e8850

leej3 force-pushed the grouping_fix branch from 3872c38 to 09e8850 Compare July 9, 2018 14:13

yarikoptic mentioned this pull request Nov 28, 2018

conflicting study identifiers in ABCD data #280

Closed

mgxd mentioned this pull request Jan 4, 2019

v0.5.2 Release #292

Merged

1 task

mgxd mentioned this pull request Apr 16, 2019

Assertions and other #144

Closed

mgxd mentioned this pull request Jul 9, 2019

ENH: grouping #359

Merged

1 task

yarikoptic closed this in #359 Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: use args.grouping #200

WIP: use args.grouping #200

leej3 commented Apr 26, 2018

yarikoptic commented Apr 26, 2018

leej3 commented Apr 26, 2018 via email

leej3 commented May 18, 2018 •

edited

Loading

leej3 commented May 24, 2018

yarikoptic commented May 24, 2018

leej3 commented May 24, 2018

codecov-io commented May 24, 2018 •

edited

Loading

yarikoptic May 25, 2018

leej3 May 25, 2018

yarikoptic left a comment

yarikoptic May 25, 2018

leej3 May 25, 2018

leej3 May 25, 2018

yarikoptic May 25, 2018

yarikoptic May 25, 2018

yarikoptic May 25, 2018

leej3 May 25, 2018

yarikoptic May 25, 2018

leej3 May 25, 2018

leej3 May 25, 2018

leej3 May 25, 2018

yarikoptic May 28, 2018

leej3 commented Jun 21, 2018

yarikoptic commented Jun 26, 2018

leej3 commented Jun 26, 2018 via email

leej3 commented Jul 9, 2018

yarikoptic commented Jul 12, 2018

WIP: use args.grouping #200

WIP: use args.grouping #200

Conversation

leej3 commented Apr 26, 2018

yarikoptic commented Apr 26, 2018

leej3 commented Apr 26, 2018 via email

leej3 commented May 18, 2018 • edited Loading

leej3 commented May 24, 2018

yarikoptic commented May 24, 2018

leej3 commented May 24, 2018

codecov-io commented May 24, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

test_conversion:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leej3 commented Jun 21, 2018

yarikoptic commented Jun 26, 2018

leej3 commented Jun 26, 2018 via email

leej3 commented Jul 9, 2018

yarikoptic commented Jul 12, 2018

leej3 commented May 18, 2018 •

edited

Loading

codecov-io commented May 24, 2018 •

edited

Loading