dataframegood first issue
Description
Describe the issue:
The .str.split method coerces dtype from string to object. This behavior is also inconsistent with pandas (if that matters).
Minimal complete verifiable example:
Code
import dask.dataframe as dd
import pandas as pd
data = {"c": ["a,b,c", "d,e,f", "g,h,i"]}
# `pandas`
df = pd.DataFrame(data, dtype="string[pyarrow]")
print(df.dtypes)
print(df["c"].str.split(",", n=1, expand=True).dtypes)
# `dask`
ddf = dd.from_pandas(df)
print(ddf.dtypes)
print(ddf["c"].str.split(",", n=1, expand=True).dtypes)
Output
c string[pyarrow]
dtype: object
0 string[pyarrow]
1 string[pyarrow]
dtype: object
c string[pyarrow]
dtype: object
0 object
1 object
dtype: object
Environment:
- Dask version: 2025.3.0
- Python version: 3.10.16
- Operating System: Ubuntu 24.04.2 LTS
- Install method (conda, pip, source): pip