Skip to content

- Consistent naming for dummy feature encoding of variables with different levels count #2847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

pat-s
Copy link
Member

@pat-s pat-s commented Aug 14, 2022

fix #2814

@mllg I don't see an immediate downside of this change and all tests seem to pass as well. OK to merge?

library(mlr)
#> Loading required package: ParamHelpers
#> Warning message: 'mlr' is in 'maintenance-only' mode since July 2019.
#> Future development will only happen in 'mlr3'
#> (<https://mlr3.mlr-org.com>). Due to the focus on 'mlr3' there might be
#> uncaught bugs meanwhile in {mlr} - please consider switching.
d <- structure(list(
  a = structure(c(2L, 1L, 1L, 1L, 3L, 2L), .Label = c("1", "2", "3"), class = "factor"),
  b = structure(c(2L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", "2"), class = "factor"),
  target = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("zero", "one"), class = "factor")
),
row.names = c(NA, -6L),
class = "data.frame"
)

mlr::createDummyFeatures(d, "target", method = "reference")
#>   target a.2 a.3 b.2
#> 1   zero   1   0   1
#> 2   zero   0   0   0
#> 3   zero   0   0   0
#> 4    one   0   0   0
#> 5    one   0   1   0
#> 6    one   1   0   1

Created on 2022-08-14 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Monterey 12.5
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2022-08-14
#>  pandoc   2.19 @ /opt/homebrew/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version     date (UTC) lib source
#>  assertthat     0.2.1       2019-03-21 [1] CRAN (R 4.2.0)
#>  backports      1.4.1       2021-12-13 [1] CRAN (R 4.2.0)
#>  BBmisc         1.12        2022-03-10 [1] CRAN (R 4.2.0)
#>  checkmate      2.1.0       2022-04-21 [1] CRAN (R 4.2.0)
#>  cli            3.3.0       2022-04-25 [1] CRAN (R 4.2.0)
#>  colorspace     2.0-3       2022-02-21 [1] CRAN (R 4.2.0)
#>  data.table     1.14.2      2021-09-27 [1] CRAN (R 4.2.0)
#>  DBI            1.1.3       2022-06-18 [1] CRAN (R 4.2.0)
#>  digest         0.6.29      2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr          1.0.9       2022-04-28 [1] CRAN (R 4.2.0)
#>  evaluate       0.16        2022-08-09 [1] CRAN (R 4.2.1)
#>  fansi          1.0.3       2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0       2021-01-25 [1] CRAN (R 4.2.0)
#>  fastmatch      1.1-3       2021-07-23 [1] CRAN (R 4.2.0)
#>  fs             1.5.2       2021-12-08 [1] CRAN (R 4.2.0)
#>  generics       0.1.3       2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2        3.3.6       2022-05-03 [1] CRAN (R 4.2.0)
#>  glue           1.6.2       2022-02-24 [1] CRAN (R 4.2.0)
#>  gtable         0.3.0       2019-03-25 [1] CRAN (R 4.2.0)
#>  highr          0.9         2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.3       2022-07-18 [1] CRAN (R 4.2.0)
#>  knitr          1.39        2022-04-26 [1] CRAN (R 4.2.0)
#>  lattice        0.20-45     2021-09-22 [1] CRAN (R 4.2.1)
#>  lifecycle      1.0.1       2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3       2022-03-30 [1] CRAN (R 4.2.0)
#>  Matrix         1.4-1       2022-03-23 [1] CRAN (R 4.2.0)
#>  mlr          * 2.19.0.9001 2022-08-14 [1] local
#>  munsell        0.5.0       2018-06-12 [1] CRAN (R 4.2.0)
#>  parallelMap    1.5.1       2021-06-28 [1] CRAN (R 4.2.0)
#>  ParamHelpers * 1.14.1      2022-07-04 [1] CRAN (R 4.2.0)
#>  pillar         1.8.0       2022-07-18 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3       2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr          0.3.4       2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache        0.16.0      2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3    1.8.2       2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo           1.25.0      2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils        2.12.0      2022-06-28 [1] CRAN (R 4.2.0)
#>  R6             2.5.1       2021-08-19 [1] CRAN (R 4.2.0)
#>  reprex         2.0.1       2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang          1.0.4       2022-07-12 [1] CRAN (R 4.2.0)
#>  rmarkdown      2.14        2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.13        2020-11-12 [1] CRAN (R 4.2.0)
#>  scales         1.2.0       2022-04-13 [1] CRAN (R 4.2.0)
#>  sessioninfo    1.2.2       2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.8       2022-07-11 [1] CRAN (R 4.2.0)
#>  stringr        1.4.0       2019-02-10 [1] CRAN (R 4.2.0)
#>  styler         1.7.0.9001  2022-08-08 [1] Github (r-lib/styler@a3b69e4)
#>  survival       3.3-1       2022-03-03 [1] CRAN (R 4.2.0)
#>  tibble         3.1.8       2022-07-22 [1] CRAN (R 4.2.0)
#>  tidyselect     1.1.2       2022-02-21 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2       2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.4.1       2022-04-13 [1] CRAN (R 4.2.0)
#>  withr          2.5.0       2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun           0.32        2022-08-10 [1] CRAN (R 4.2.1)
#>  yaml           2.3.5       2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/pjs/Library/R/arm64/4.2/library
#>  [2] /opt/R/4.2.1-arm64/Resources/site-library
#>  [3] /opt/R/4.2.1-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@mllg
Copy link
Member

mllg commented Aug 14, 2022

LGTM

@pat-s pat-s changed the title Consistent naming for dummy feature encoding of variables with different levels count - Consistent naming for dummy feature encoding of variables with different levels count Aug 17, 2022
@pat-s pat-s merged commit 2ebca8e into main Aug 17, 2022
@pat-s pat-s deleted the fix-2814 branch August 17, 2022 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

createDummyFeatures gives wrong and unconsistant names used with two factor levels
2 participants