dask/dask

`.str.split` coerces `dtype` to `object`

Open

#11,884 opened on Apr 11, 2025

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Python (11,520 stars) (1,658 forks)batch import
dataframegood first issue

Description

Describe the issue:

The .str.split method coerces dtype from string to object. This behavior is also inconsistent with pandas (if that matters).

Minimal complete verifiable example:

Code

import dask.dataframe as dd
import pandas as pd

data = {"c": ["a,b,c", "d,e,f", "g,h,i"]}

# `pandas`
df = pd.DataFrame(data, dtype="string[pyarrow]")
print(df.dtypes)
print(df["c"].str.split(",", n=1, expand=True).dtypes)

# `dask`
ddf = dd.from_pandas(df)
print(ddf.dtypes)
print(ddf["c"].str.split(",", n=1, expand=True).dtypes)

Output

c    string[pyarrow]
dtype: object
0    string[pyarrow]
1    string[pyarrow]
dtype: object
c    string[pyarrow]
dtype: object
0    object
1    object
dtype: object

Environment:

  • Dask version: 2025.3.0
  • Python version: 3.10.16
  • Operating System: Ubuntu 24.04.2 LTS
  • Install method (conda, pip, source): pip

Contributor guide