Skip to content

Bug fix for Issue 2639 #2646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

annette987
Copy link

When using the RF min depth filter with a method other than "md" (i.e.
"vh" or "vh.vimp"), an error is generated at line 298:
Error in -im[, 1L]: invalid argument to unary operator.
This happens because im = NULL.
From code: im = (result of RF var.select)$md.obj$order
From randomForestSRC documentation: md.obj - Minimal depth object. NULL
unless method = "md".
Therefore the object md.obj cannot be used for methods "vh" and "vh.vimp".
The solution is to use object varselect instead, which is a matrix
containing the tree depth and relative frequency for each variable. Note
also that object topvars contains the names of the top variables.

P.S. This is my first PR and I don;t have access to the coding guidelines or other docs. Could I be added please?

When using the RF min depth filter with a method other than "md" (i.e.
"vh" or "vh.vimp"), an error is generated at line 298:
	Error in -im[, 1L]: invalid argument to unary operator.
This happens because im = NULL.
From code im = (result of RF var.select)$md.obj$order
From randomForestSRC documentation: md.obj - Minimal depth object. NULL
unless method = "md".
Therefore the object md.obj cannot be used for methods "vh" and "vh.vimp".
The solution is to use object varselect instead, which is a matrix
containing the tree depth and relative frequency for each variable. Note
also that object topvars contains the names of the top variables.
@annette987
Copy link
Author

Sorry, I have never done this before and I don't really know what to do next. Could I have some guidance please?

@@ -276,8 +276,8 @@ rf.min.depth = makeFilter(
supported.features = c("numerics", "factors", "ordered"),
fun = function(task, nselect, method = "md", ...) {
im = randomForestSRC::var.select(getTaskFormula(task), getTaskData(task),
method = method, verbose = FALSE, ...)$md.obj$order
setNames(-im[, 1L], rownames(im))
method = method, verbose = FALSE, ...)$varselect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the indentation.

@larskotthoff
Copy link
Member

Thanks for the PR, looks good apart from some minor issues I raised in the review. You can find our contributor guidelines in the wiki at https://github.com/mlr-org/mlr3/wiki/pr-guidelines

For bug fixes, we ask that you write a unit test that checks whether the bug is fixed in addition to the fix itself. This is to make sure that we don't accidentally reintroduce the same bug in future changes. Could you do that please? Thanks!

@pat-s
Copy link
Member

pat-s commented Oct 7, 2019

The solution is to use object varselect instead, which is a matrix

Was there an upstream change in randomForestSRC?
Is the change robust and applies to all methods, i.e. will everything else work as before with added benefits?

Reading your initial post is not easy, please take a few more minutes next time to format it nicely. Thanks.

@pat-s
Copy link
Member

pat-s commented Oct 7, 2019

It's a bit more complicated.

First, @annette987 is right that md.obj only exists if method = "md". If the user passed a different method via more.args an error will occur because 'md.obj` does not exist.

Hence, so far we only allowed for method = "md" implicitly internally without telling the user.

However, to make all methods available with a robust behavior, we cannot simply use the $varselect slot of the returned object. This is because the slot always looks different depending on the chosen method.

I added support for method = "vh" now in PR XY.
AFAICS we cannot support method = "vh.imp" as this method does not return any importance score in the returned object. However it seems that this method is just a fast variant of "vh" and is probably not a good choice.

I'll close here since #2649 should solve the issue.
Thanks for contributing, your digging brought up a bug in the package 👍

@pat-s pat-s closed this Oct 7, 2019
@annette987 annette987 deleted the fix-rf_vh-bug branch November 13, 2019 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants