Skip to content

Commit 9f7fbc4

Browse files
committed
Change the default dtype of get_dummies to bool
1 parent 707a222 commit 9f7fbc4

File tree

12 files changed

+117
-157
lines changed

12 files changed

+117
-157
lines changed

doc/source/user_guide/reshaping.rst

-8
Original file line numberDiff line numberDiff line change
@@ -608,7 +608,6 @@ values, can derive a :class:`DataFrame` containing ``k`` columns of 1s and 0s us
608608
:func:`~pandas.get_dummies`:
609609

610610
.. ipython:: python
611-
:okwarning:
612611
613612
df = pd.DataFrame({"key": list("bbacab"), "data1": range(6)})
614613
@@ -618,7 +617,6 @@ Sometimes it's useful to prefix the column names, for example when merging the r
618617
with the original :class:`DataFrame`:
619618

620619
.. ipython:: python
621-
:okwarning:
622620
623621
dummies = pd.get_dummies(df["key"], prefix="key")
624622
dummies
@@ -628,7 +626,6 @@ with the original :class:`DataFrame`:
628626
This function is often used along with discretization functions like :func:`~pandas.cut`:
629627

630628
.. ipython:: python
631-
:okwarning:
632629
633630
values = np.random.randn(10)
634631
values
@@ -645,7 +642,6 @@ variables (categorical in the statistical sense, those with ``object`` or
645642

646643

647644
.. ipython:: python
648-
:okwarning:
649645
650646
df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["c", "c", "b"], "C": [1, 2, 3]})
651647
pd.get_dummies(df)
@@ -654,7 +650,6 @@ All non-object columns are included untouched in the output. You can control
654650
the columns that are encoded with the ``columns`` keyword.
655651

656652
.. ipython:: python
657-
:okwarning:
658653
659654
pd.get_dummies(df, columns=["A"])
660655
@@ -672,7 +667,6 @@ the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways:
672667
* dict: Mapping column name to prefix.
673668

674669
.. ipython:: python
675-
:okwarning:
676670
677671
simple = pd.get_dummies(df, prefix="new_prefix")
678672
simple
@@ -686,7 +680,6 @@ variable to avoid collinearity when feeding the result to statistical models.
686680
You can switch to this mode by turn on ``drop_first``.
687681

688682
.. ipython:: python
689-
:okwarning:
690683
691684
s = pd.Series(list("abcaa"))
692685
@@ -697,7 +690,6 @@ You can switch to this mode by turn on ``drop_first``.
697690
When a column contains only one level, it will be omitted in the result.
698691

699692
.. ipython:: python
700-
:okwarning:
701693
702694
df = pd.DataFrame({"A": list("aaaaa"), "B": list("ababc")})
703695

doc/source/whatsnew/v0.13.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,6 @@ Enhancements
501501
- ``NaN`` handing in get_dummies (:issue:`4446`) with ``dummy_na``
502502

503503
.. ipython:: python
504-
:okwarning:
505504
506505
# previously, nan was erroneously counted as 2 here
507506
# now it is not counted at all

doc/source/whatsnew/v0.15.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,6 @@ Other:
10071007
left untouched.
10081008

10091009
.. ipython:: python
1010-
:okwarning:
10111010
10121011
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
10131012
'C': [1, 2, 3]})

doc/source/whatsnew/v0.19.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,6 @@ The ``pd.get_dummies`` function now returns dummy-encoded columns as small integ
431431
**New behavior**:
432432

433433
.. ipython:: python
434-
:okwarning:
435434
436435
pd.get_dummies(["a", "b", "a", "c"]).dtypes
437436

doc/source/whatsnew/v0.23.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,6 @@ Function ``get_dummies`` now supports ``dtype`` argument
366366
The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)
367367

368368
.. ipython:: python
369-
:okwarning:
370369
371370
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
372371
pd.get_dummies(df, columns=['c']).dtypes

doc/source/whatsnew/v0.24.0.rst

-1
Original file line numberDiff line numberDiff line change
@@ -833,7 +833,6 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
833833
Now, the return type is consistently a :class:`DataFrame`.
834834

835835
.. ipython:: python
836-
:okwarning:
837836
838837
type(pd.get_dummies(df, sparse=True))
839838
type(pd.get_dummies(df[['B', 'C']], sparse=True))

doc/source/whatsnew/v1.5.0.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -932,7 +932,6 @@ Other Deprecations
932932
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
933933
- Deprecated the ``inplace`` keyword in :meth:`DataFrame.set_axis` and :meth:`Series.set_axis`, use ``obj = obj.set_axis(..., copy=False)`` instead (:issue:`48130`)
934934
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
935-
- Deprecated ``np.uint8`` as the default ``dtype`` for :func:`get_dummies` - in a future version, it will be changed to ``bool`` (:issue:`45848`)
936935
- Fixed up warning message of deprecation of :meth:`MultiIndex.lesort_depth` as public method, as the message previously referred to :meth:`MultiIndex.is_lexsorted` instead (:issue:`38701`)
937936
- Deprecated the ``sort_columns`` argument in :meth:`DataFrame.plot` and :meth:`Series.plot` (:issue:`47563`).
938937
- Deprecated positional arguments for all but the first argument of :meth:`DataFrame.to_stata` and :func:`read_stata`, use keyword arguments instead (:issue:`48128`).
@@ -1192,7 +1191,6 @@ Groupby/resample/rolling
11921191
- Bug in :meth:`DataFrameGroupBy.resample` raises ``KeyError`` when getting the result from a key list which misses the resample key (:issue:`47362`)
11931192
- Bug in :meth:`DataFrame.groupby` would lose index columns when the DataFrame is empty for transforms, like fillna (:issue:`47787`)
11941193
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` with ``dropna=False`` and ``sort=False`` would put any null groups at the end instead the order that they are encountered (:issue:`46584`)
1195-
-
11961194

11971195
Reshaping
11981196
^^^^^^^^^
@@ -1210,6 +1208,7 @@ Reshaping
12101208
- Bug in :meth:`concat` when ``axis=1`` and ``sort=False`` where the resulting Index was a :class:`Int64Index` instead of a :class:`RangeIndex` (:issue:`46675`)
12111209
- Bug in :meth:`wide_to_long` raises when ``stubnames`` is missing in columns and ``i`` contains string dtype column (:issue:`46044`)
12121210
- Bug in :meth:`DataFrame.join` with categorical index results in unexpected reordering (:issue:`47812`)
1211+
- Bug in :func:`get_dummies` ``np.uint8`` being the default ``dtype``, changed to ``bool`` (:issue:`45848`)
12131212

12141213
Sparse
12151214
^^^^^^

doc/source/whatsnew/v1.5.1.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Fixed regressions
2323

2424
Bug fixes
2525
~~~~~~~~~
26-
-
26+
- Bug in :func:`get_dummies` with default ``dtype`` being ``uint8`` - the default ``dtype`` is now changed to ``bool`` (:issue:`45848`)
2727
-
2828

2929
.. ---------------------------------------------------------------------------

pandas/core/reshape/encoding.py

+32-40
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,13 @@
11
from __future__ import annotations
22

33
from collections import defaultdict
4-
import inspect
54
import itertools
65
from typing import Hashable
7-
import warnings
86

97
import numpy as np
108

119
from pandas._libs.sparse import IntIndex
1210
from pandas._typing import Dtype
13-
from pandas.util._exceptions import find_stack_level
1411

1512
from pandas.core.dtypes.common import (
1613
is_integer_dtype,
@@ -66,7 +63,7 @@ def get_dummies(
6663
drop_first : bool, default False
6764
Whether to get k-1 dummies out of k categorical levels by removing the
6865
first level.
69-
dtype : dtype, default np.uint8
66+
dtype : dtype, default bool
7067
Data type for new columns. Only a single dtype is allowed.
7168
7269
Returns
@@ -89,50 +86,50 @@ def get_dummies(
8986
>>> s = pd.Series(list('abca'))
9087
9188
>>> pd.get_dummies(s)
92-
a b c
93-
0 1 0 0
94-
1 0 1 0
95-
2 0 0 1
96-
3 1 0 0
89+
a b c
90+
0 True False False
91+
1 False True False
92+
2 False False True
93+
3 True False False
9794
9895
>>> s1 = ['a', 'b', np.nan]
9996
10097
>>> pd.get_dummies(s1)
101-
a b
102-
0 1 0
103-
1 0 1
104-
2 0 0
98+
a b
99+
0 True False
100+
1 False True
101+
2 False False
105102
106103
>>> pd.get_dummies(s1, dummy_na=True)
107-
a b NaN
108-
0 1 0 0
109-
1 0 1 0
110-
2 0 0 1
104+
a b NaN
105+
0 True False False
106+
1 False True False
107+
2 False False True
111108
112109
>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
113110
... 'C': [1, 2, 3]})
114111
115112
>>> pd.get_dummies(df, prefix=['col1', 'col2'])
116113
C col1_a col1_b col2_a col2_b col2_c
117-
0 1 1 0 0 1 0
118-
1 2 0 1 1 0 0
119-
2 3 1 0 0 0 1
114+
0 1 True False False True False
115+
1 2 False True True False False
116+
2 3 True False False False True
120117
121118
>>> pd.get_dummies(pd.Series(list('abcaa')))
122-
a b c
123-
0 1 0 0
124-
1 0 1 0
125-
2 0 0 1
126-
3 1 0 0
127-
4 1 0 0
119+
a b c
120+
0 True False False
121+
1 False True False
122+
2 False False True
123+
3 True False False
124+
4 True False False
128125
129126
>>> pd.get_dummies(pd.Series(list('abcaa')), drop_first=True)
130-
b c
131-
0 0 0
132-
1 1 0
133-
2 0 1
134-
3 0 0
135-
4 0 0
127+
b c
128+
0 False False
129+
1 True False
130+
2 False True
131+
3 False False
132+
4 False False
136133
137134
>>> pd.get_dummies(pd.Series(list('abc')), dtype=float)
138135
a b c
@@ -236,16 +233,11 @@ def _get_dummies_1d(
236233
codes, levels = factorize_from_iterable(Series(data))
237234

238235
if dtype is None:
239-
warnings.warn(
240-
"In a future version of pandas the default dtype will change from "
241-
"'uint8' to 'bool', please specify a dtype to silence this warning",
242-
FutureWarning,
243-
stacklevel=find_stack_level(inspect.currentframe()),
244-
)
245-
dtype = np.dtype(np.uint8)
236+
dtype = bool
246237
# error: Argument 1 to "dtype" has incompatible type "Union[ExtensionDtype, str,
247238
# dtype[Any], Type[object]]"; expected "Type[Any]"
248-
dtype = np.dtype(dtype) # type: ignore[arg-type]
239+
else:
240+
dtype = np.dtype(dtype) # type: ignore[arg-type]
249241

250242
if is_object_dtype(dtype):
251243
raise ValueError("dtype=object is not a valid dtype for get_dummies")

pandas/tests/frame/indexing/test_getitem.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,7 @@ def test_getitem_list_of_labels_categoricalindex_cols(self):
5252
# GH#16115
5353
cats = Categorical([Timestamp("12-31-1999"), Timestamp("12-31-2000")])
5454

55-
expected = DataFrame(
56-
[[1, 0], [0, 1]], dtype="uint8", index=[0, 1], columns=cats
57-
)
55+
expected = DataFrame([[1, 0], [0, 1]], dtype="bool", index=[0, 1], columns=cats)
5856
dummies = get_dummies(cats)
5957
result = dummies[list(dummies.columns)]
6058
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_sort_values.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def test_sort_values_sparse_no_warning(self):
1919
# GH#45618
2020
# TODO(2.0): test will be unnecessary
2121
ser = pd.Series(Categorical(["a", "b", "a"], categories=["a", "b", "c"]))
22-
df = pd.get_dummies(ser, sparse=True)
22+
df = pd.get_dummies(ser, dtype=np.uint8, sparse=True)
2323

2424
with tm.assert_produces_warning(None):
2525
# No warnings about constructing Index from SparseArray

0 commit comments

Comments
 (0)