-
Notifications
You must be signed in to change notification settings - Fork 131
WIP: use args.grouping #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oh, so it wasn't passed through? thanks @leej3 |
yep apologies. that was a quick hack because i needed to use it. i can't
vouch for that being the desired behavior though!
…On Thu, Apr 26, 2018 at 5:00 PM, Yaroslav Halchenko < ***@***.***> wrote:
oh, so it wasn't passed through? thanks @leej3 <https://github.com/leej3>
ideally we should accompany with a dedicated test so things do not get
broken again in RF
Note that it already breaks the tests (I guess changes current/default
behavior)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFKsqN2yU_x34EYwPo5fpyY5PAMqobXAks5tsjVxgaJpZM4TnfPz>
.
|
Many scans I encounter do not pass the assertion that studyUID == file_studyUID so using accession_number for grouping is the only way that I can get this working. I'm under the impression that this is still the default behaviour. "grouping" is None either way. It's only when one changes grouping to accession_number that it alters the behaviour. I've rebased this onto master. EDIT: I've since determined that all of these encounters are when I received to study's tarred up together. Perhaps not something that needs to be addressed. Although, if grouping is studyUID instead of None and the heuristics file is able to deal with a dict of seqinfo objects, heudiconv can still handle the tars. |
0660dd0
to
05de1d8
Compare
@yarikoptic perhaps i can get your help with this. I think what was causing confusion for me is that for
in test_regression.py the args object is not getting passed to runner() as I expect it to be. args is not the same before and after parse_args(argv). Perhaps you could confirm this? I changed the keyword argument to main() to be args to prevent this. I'm now passing the tests that I was previously failing. I'm now failing other tests though. I think its because the object returned by group_dicoms_into_seqinfos changes depending on whether grouping is None or not. it alternates between a) a dict with seqinfo objects as keys and files as values and b) a dict of the aforementioned dicts. Is there a reason for this behaviour or could we make it uniform? |
Thanks for pushing this @leej3 . I will look at it with a fresh eye tomorrow morning. |
Great thanks. I've pushed an example of turning everything to a the aforementioned (b) and then from there to (a), which is currently used downstream in a few places. Seems to be better at passing the tests. |
Codecov Report
@@ Coverage Diff @@
## master #200 +/- ##
=========================================
- Coverage 67.81% 67.72% -0.1%
=========================================
Files 30 31 +1
Lines 2411 2429 +18
=========================================
+ Hits 1635 1645 +10
- Misses 776 784 +8
Continue to review full report at Codecov.
|
heudiconv/cli/run.py
Outdated
parser.add_argument('-g', '--grouping', default='studyUID', | ||
choices=('studyUID', 'accession_number'), | ||
parser.add_argument('-g', '--grouping', default='None', | ||
choices=('studyUID', 'accession_number','None'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leej3 - what do you want to say/achieve when groupping is 'None'
? do you want
- to convert all provided DICOMs as a part of the single study as if their studyUID was the same? (then I would call it "all" or something like that)
- to convert every sequence as it was a study on its own? (that is where
None
is making sense to me "semantically")
Also I would prefer to keep studyUID
as the default for a number of reasons:
- consistent behavior with previous versions
- it is the most typical case AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, default of studyUID sounds good. And agreed, 'all' might be more intuitive. Not sure what I would instinctively interpret 'None' to mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think you wanted "to convert all provided DICOMs as a part of the single study as if their studyUID was the same?" Let me try to push/reverse(where needed) forward a bit and see where I get.
Re unified output of group_dicoms_into_seqinfos
-- indeed would make sense... I guess I kept it returning original one, whenever no groupping, for compatibility or something like that. But indeed it should (but who knows) unify the code if it would be uniform. Let me see
heudiconv/dicoms.py
Outdated
print("Trying to construct Seqinfo object:", | ||
"{ob}".format(ob=SeqInfo.__doc__), | ||
"\n However the arguments supplied were:") | ||
print([repr(v) for v in seqinfo_args]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use lgr.error
not plain print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes multiple accession_numbers/studyUIDs being converted as a single study. i.e. the scanner crashes and two "unique sessions" should be merged to prevent needless complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep it seems easy enough to always return the uniform object. The only problem arises when you create filegroup_json. If you have duplicate series_ids it will fail which is undesirable for a large number of dicoms passed in that are trivially separable by the combination of series_id and grouping variable (or even just their seqinfo object). We could alter the filegroup json object to have an additionaly layer of nesting, fail with an assertion like i've done, or something more inventive perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "scanner crashes and two "unique sessions" should be merged" -- if accession was the same, wouldn't current --groupping=accession_number
be sufficient? (just trying to get a fuller picture)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "problem ... filegroup_json. If you have duplicate series_ids " ""unique sessions" should be merged" ... "undesirable for a large number of dicoms passed in". Let me ouline
- I think all DICOMs should be passed in and no magical "merge" (or "substitute") should happen automagically
- indeed it is a problem (and I think we guard for it) if duplicate
series_id
comes in sincefilegroup*.json
contains a dict. We should make them "non-conflicting" (indeed question how and when?) - "by combination of series_id and grouping variable", "additionaly layer of nesting" -- too complex imho and then would not be uniform with already existing setup. May be we should just disambiguate by adding a "+1", "+2" or alike whenever conflict arises (and issue a warning). The question is how early such analysis should happen
still digging... will need to switch to other thing soon though... Question: do you have (or could you generate e.g. using nifti2dicom) any sample data you could possibly share to demonstrate your usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satra @mgxd -- I just want to make sure: for the usecase when dicom path template provided, should there be additional splitting into sessions/studies within each of those subject(/session) pairs by anything or all matching dicoms should just belong to that subject(/session)? what would happen if duplicates (canceled sessions, restarted)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "wouldn't current --groupping=accession_number be sufficient", the accession_numbers are different too.
re: "(indeed question how and when?)", not sure I know enough about the different ways in which the pipeline can be run. For example an especially challenging condition is if two different tars, targeted to the same output directory, run at the same time. I guess when filegroup_file is written (save_json(filegroup_file, filegroup)) we could check if it already exists and append to it. This would require a check for duplication and modification of the seqinfo object series_ids too. May be a little bit too challenging. A simpler case to accommodate is the one I outlined where it is all contained in the same tar.
re: "sample data", I was given this data so I don't know for sure about how it was created. From what I can tell 2 study dicom tars were untarred and the two directories were used to create a new tar. So this process would reasonably replicate the data (provided they have some matching series_ids).
heudiconv/cli/run.py
Outdated
lgr.warning("Nothing to be done - displaying usage help") | ||
parser.print_help() | ||
sys.exit(1) | ||
argv = sys.argv[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet clear to me why we need to overload those and why it all should make a difference. args should be the args to be parsed by the parser, either they come from the tests or cmdline (via sys.argv
). main
shouldn't get the args as a structure to be overwritten later on etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I may be entirely wrong and the those changes are not required. Its all abit confusing when debugging the regression test test_conversion (I've modified this slightly, posted below). Perhaps it could be changed simply to avoid that. Here's what I observe:
- Args exists as an object from pytest at the start of the test. Printing this object fails.
- After "args = gen_heudiconv_args(datadir, outdir, subject, heuristic)" is executed print now works and outputs a list.
- Stepping into runner(args), args is now:
argv = ['-d', '/tmp/pytest-of-root/QA/sourcedata/{subject}/*/*/*.tgz', '-c', 'dcm2niix', '-o', '/tmp/pytest-of-root/out', '-s', 'sub-sid000143', '-f', '/src/heudiconv/heudiconv/heuristics/reproin.py', '--bids']
Running parser.parse_args(argv) outputs the namespace object we want but running
args = parser.parse_args(argv)
Does not assign the result to args. Subsequent tests using its attributes are not really going to work for example bool( args.grouping) at this time will yield true.
Do you observe this?
test_conversion:
"""Testing conversion with conversion saved on datalad"""
import json
from glob import glob
import pytest
have_datalad = True
try:
from datalad import api # to pull and grab data
from datalad.support.exceptions import IncompleteResultsError
except ImportError:
have_datalad = False
import heudiconv
from heudiconv.cli.run import main as runner
# testing utilities
from .utils import fetch_data, gen_heudiconv_args
import py
import os
import shutil
@pytest.mark.parametrize('tmpdir', [py._path.local.LocalPath('/tmp/pytest-of-root')])
@pytest.mark.parametrize('datadir', ['pytest-3'])
@pytest.mark.parametrize('subject', ['sub-sid000143'])
@pytest.mark.parametrize('heuristic', ['reproin.py'])
@pytest.mark.skipif(not have_datalad, reason="no datalad")
def test_conversion(tmpdir, subject, heuristic,datadir):
tmpdir.chdir()
if os.path.exists('out'):
shutil.rmtree('out')
if os.path.exists(datadir):
outdir = os.path.join(tmpdir.strpath, 'out')
else:
try:
datadir = fetch_data(tmpdir.strpath, subject)
except IncompleteResultsError as exc:
pytest.skip("Failed to fetch test data: %s" % str(exc))
outdir = tmpdir.mkdir('out').strpath
import pdb; pdb.set_trace()
args = gen_heudiconv_args(datadir, outdir, subject, heuristic)
print(args)
# import pdb;pdb.set_trace()
runner(args) # run conversion
# verify functionals were converted
assert glob('{}/{}/func/*'.format(outdir, subject)) == \
glob('{}/{}/func/*'.format(datadir, subject))
# compare some json metadata
json_ = '{}/task-rest_acq-24mm64sl1000tr32te600dyn_bold.json'.format
orig, conv = (json.load(open(json_(datadir))),
json.load(open(json_(outdir))))
keys = ['EchoTime', 'MagneticFieldStrength', 'Manufacturer', 'SliceTiming']
for key in keys:
assert orig[key] == conv[key]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok part of the confusion is that args is a pdb command. that may solve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming it something other than other args would be helpful for any future pdb debugging sessions. I forgot args was a pdb command. But yes provided you aren't stepping through the code with pdb it all makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even know about args pdb command ;-)
this will fail if a tar contains multiple uids.
don't think this is required
* gh-leej3/grouping_fix: remove assertion about unique series_ids restore default grouping to None report grouping add grouping=all option use original group_map order make run order the same as the archive tidy print change back code causing nipype error provide some help upon SeqInfo object construction failure fix args parsing for running heudiconv from cmdline Conflicts: heudiconv/cli/run.py - took my version, TODO: rename args heudiconv/convert.py - OrderedDict, took leej3's version + my multilining heudiconv/dicoms.py - minor doc conflict
* origin/master: BF: for datalad 0.10.0 - check/populate .gitattributes correctly BF+ENH: set dist-restrictions metadata correctly
Before this pull request the following usecases(implementation) were:
Need to implement tests to confirm that this is the logic we are implementing. For this, automagic IO context manager for files used in tests. Also want to cache (done with fixture). |
Hi @leej3 ... so did |
Have been away from a computer. Should get a chance over the next couple of days.
John
… On Jun 26, 2018, at 22:17, Yaroslav Halchenko ***@***.***> wrote:
Hi @leej3 ... so did -g accession_number --files work for you?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
…_fix * commit '2ff09b1a1b77282c9a10a74799d280db1bc4f85b': rel: bump to next version rel: proper release day Fix links to heuristcs in README.md doc+enh: update docker dcm2niix, add entry to changelog doc: changelog + version bump fix: do not load phasediff until exists BF: workaround for elderly buggy python 2.7 exec
@yarikoptic, just to sync back up: I have a tar contain two different studies (2 studyUIDs and 2 unique accession numbers, my center does that last bit differently from yours). I would like for me to be able to pass this single tar to heudiconv and to get a single session as output. Currently this works given certain input parameters (I believe erroneously). I can pass --grouping accession_number for the template usage or --files usage and get back a single session. When using the template I explicitly define the session (-ss session_tag). I'm not sure if i'm using --files correctly but I just got whatever session i defined with infotoids in the heuristics file. We talked about this previously but just to reiterate: moving this pull request forward shall we drop the template interface and focus on consolidating what is in --files (and I'll learn to use that better). This would mean that one can use the --files flag and obtain the following behaviours:
I've pushed a set of proposed tests for this behaviour. I wasn't sure how to use AutomagicIO. Could you help with that. Do you think this is a useful way forward? I think if we can pass all these tests we don't remove any functionality that we know someone is using but consolidate the interface (by removing the template usage) to reduce the complexity of the code. |
thanks @leej3 ! I will try to get back to it in upcoming days. Also I still have some changes I was trying to meld into yours while we went through all that logic (or "confusion" to say more appropriately) analysis at OHBM |
Fixes crawling/grouping mentioned by @yarikoptic in #198