-
Notifications
You must be signed in to change notification settings - Fork 533
[MAINT] Cleaning / simplify Node #2325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nipype/pipeline/engine/nodes.py
Outdated
result, _, _ = self._load_resultfile(cwd) | ||
return result | ||
# Cache first | ||
if not self._result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to do this i think. this will increase memory consumption. we want results to be loaded on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll see how to improve the process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, no self._result
anymore.
@@ -189,12 +188,11 @@ def interface(self): | |||
|
|||
@property | |||
def result(self): | |||
if self._result: | |||
return self._result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this was a legacy from before. we should check when self._result
is actually not None.
nipype/pipeline/engine/nodes.py
Outdated
for hf in hashfiles: | ||
os.remove(hf) | ||
|
||
if updatehash and len(hashfiles) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think the second clause is necessary or the os.remove statement.
the intent of updatehash==True
was to always update the hash and return true. not just when there was only one hashfile. it was intended to be used cautiously.
nipype/utils/filemanip.py
Outdated
return path | ||
|
||
|
||
def emptydirs(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
@djarecka, @mgxd another maintenance PR for you to review, sorry about that. I don't have more of these for now... if it is of any help. The idea of these two refactorings is making things ready for the code rodeo and to scrub some rust off. Here I'm changing some of the functionality of |
nipype/pipeline/engine/nodes.py
Outdated
if op.exists(outdir): | ||
# Find previous hashfiles | ||
hashfiles = glob(op.join(outdir, '_0x*.json')) | ||
if len(hashfiles) > 1: # Remove hashfiles if more than one found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or (hashfiles and hashfiles[0] != hashfile)
this would ensure that two hashfiles are never left in the directory.
also we should check for any unfinished hashfiles in the directory and raise an exception. i.e. tell the user that you can update a hashfile in an unfinished hashfile situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the second idea: are you sure we want to error? We don't want to just dismiss any viable hashfile and clear up the folder (if an unfinished hashfile was found)?
I'm pushing a commit for this, with error. Let me know if we just want to clear up the folder and return no hashfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW @satra, what do you think about passing the responsibilities of hashing/caching on to the interface?
The run()
of the interface would have a force_run
argument (so it can be done dynamically) and a updatehash
to also support that option.
It'd be great to have the interfaces cache themselves.
@oesteban - passing hashing to interfaces is a bigger discussion of merging node and interface into a common baseclass. let's leave that out of this and talk about separately. @djarecka has finally gotten back to the engine revision. let's review the base class question after that's completed. the dichotomy for caching has often been around whether we should be caching or users do it. i think making memcache be the default way interfaces run would be a good thing, but that's a drastic change, so we leave that for 2.0. |
Oh, sure thing. I wasn't proposing that for this PR :) Have you thought about rising an error or just go on when an unfinished hashfile is found in the output directory (when it shouldn't)? |
nipype/pipeline/engine/nodes.py
Outdated
except OSError: | ||
# Changing back to cwd is probably not necessary | ||
# but this makes sure there's somewhere to change to. | ||
cwd = os.path.split(outdir)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can unify and use op
everywhere
nipype/pipeline/engine/nodes.py
Outdated
@@ -304,143 +354,109 @@ def run(self, updatehash=False): | |||
updatehash: boolean | |||
Update the hash stored in the output directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't change this, but I'm really missing longer explanation of updatehash
or suggestions when users should use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @satra agrees, we can remove it. I think it is an ancient feature.
The idea being that, when you run with updatehash=True
, even if the hash of some input changed, the interface will not be run again and reuse the old output as output.
For instance, you forgot to mark with nohash=True
some of your inputs (say the number of threads), but you don't want to rerun interfaces that are already cached. Then you would change the number of threads and run with updatehash=True
. When nipype finds some of your interfaces with this problem cached, they are not run again and the hash of this input gets updated.
I'm not sure the example is a use case but I understand the feature that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's true that this is rarely used currently, but when people were using interactive workflows and moving around the working directory this became a very useful thing. examples are:
- you run a workflow and move the working directory from one filesystem to another. all the hashes will change
- you go into the working directory to debug something and messed things up. you can quickly rerun an update hash that updates your cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanations! if we decide to leave it, can we extend the docstring?
nipype/pipeline/engine/nodes.py
Outdated
if not force_run and str2bool(self.config['execution']['stop_on_first_rerun']): | ||
raise Exception('Cannot rerun when "stop_on_first_rerun" is set to True') | ||
|
||
# Hashfile while running, remove if exists already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably more abut the comment placing: I understand that you don't want to remove unfinished hashfile here..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check.
@@ -1156,8 +1037,9 @@ def _make_nodes(self, cwd=None): | |||
base_dir=op.join(cwd, 'mapflow'), | |||
name=nodename) | |||
node.plugin_args = self.plugin_args | |||
node._interface.inputs.trait_set( | |||
node.interface.inputs.trait_set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change necessary? just want to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, it is preferable to access public members rather than private members (marked generally with underscore) from outside the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yes, it is for node
, not for self
...
# Values in common keys would differ quite often, | ||
# so we need to join the messages together | ||
for k in new_keys.intersection(old_keys): | ||
same = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you sure? how would you modify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's too late in Europe to be sure about anything, but regardless what happens in this loop, same
should be defined/changed either in try
part or except
part, so looked to me that it doesn't have to be defined before try/except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #2320. Even though this PR is changing a lot already, I'd leave this catch for a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -904,7 +1142,7 @@ def _standardize_iterables(node): | |||
fields = set(node.inputs.copyable_trait_names()) | |||
# Flag indicating whether the iterables are in the alternate | |||
# synchronize form and are not converted to a standard format. | |||
synchronize = False | |||
# synchronize = False # OE: commented out since it is not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is these comments are valid for node.synchronize
instead of synchronize
. would be good to clean it so it's clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to remove it since the comment above seemed relevant. But I agree that once we clarify the intent of this synchronize, we should clean up these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was my guess that the these comments are relevant also for node.synchronize
but wasn't sure if entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #2320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effigies my concerns were more about the comments explaining synchronize
. Does it still apply to node.synchronize
after all changes? I don't fully understand the comments and never used it, so if you can confirm (or suggest changes) that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. It does read to me that there was intent to switch synchronize behavior with this flag, and that the two lines above only apply to that case. I think I would remove them, because the comment below seems to accurately describe the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the clean up - at first glance functionality should remain the same - LGTM
nipype/pipeline/engine/nodes.py
Outdated
return op.abspath(op.join(outputdir, self.name)) | ||
|
||
self._output_dir = op.abspath(op.join(outputdir, self.name)) | ||
return self._output_dir | ||
|
||
def set_input(self, parameter, val): | ||
""" Set interface input value""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excess space
nipype/pipeline/engine/nodes.py
Outdated
if result and result.outputs: | ||
val = getattr(result.outputs, parameter) | ||
return val | ||
return getattr(self.result.outputs, parameter, None) | ||
|
||
def help(self): | ||
""" Print interface help""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space
nipype/pipeline/engine/nodes.py
Outdated
|
||
def _copyfiles_to_wd(self, outdir, execute, linksonly=False): | ||
def _copyfiles_to_wd(self, execute=True, linksonly=False): | ||
""" copy files over and change the inputs""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space
**deepcopy(self._interface.inputs.get())) | ||
node.interface.resource_monitor = self._interface.resource_monitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.interface.resource_monior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would opt for self._interface
. I've made sure no self.interface
remain.
nipype/pipeline/engine/utils.py
Outdated
from copy import deepcopy | ||
from glob import glob | ||
from distutils.version import LooseVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ... import LooseVersion
nipype/pipeline/engine/utils.py
Outdated
|
||
|
||
def write_report(node, report_type=None, is_mapnode=False): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just make this one line if not expanding
if level > len(colorset) - 2: | ||
level = 3 # Loop back to blue | ||
level = 3 # Loop back to blue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, apparently there is some PEP somewhere advising to have 2 spaces before the inline comments
nipype/pipeline/engine/nodes.py
Outdated
outdir = op.join(outdir, '_tempinput') | ||
makedirs(outdir, exist_ok=True) | ||
|
||
for info in self.interface._get_filecopy_info(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._interface
and self.interface
should be equal - but as an added layer of security wouldn't it be better to keep the usage consistent?
I'm seeing this from time to time:
Will let you know when I find why this is happening. |
Alright, this is looking good. The |
@oesteban - while using this PR
|
perhaps we should move towards |
This PR uses some code I had under the hood before 0.14.0 that simplifies
Node
. These are the changes:nipype.pipeline.engine.utils.make_output_dir
tonipype.utils.filemanip.makedirs
and change the function signature to be consistent with the os.makedirs of Python >=3.3. The behavior has been barely modified (ifexists_ok=False
falls back to the traditional os.makedirs; runs former nipype code otherwise).output_dir()
.hash_exists
- should implement the same in a hopefully more reliable way (less branches)Node.run
- again, reducing code branches.Node._run_interface
, moving the chdirs toNode.run
. Seems to fix [BUG] FileNotFoundError: [Errno 2] No such file or directory nipreps/fmriprep#868, however I haven't checked deep enough.