Add TimeSeriesCV and HomogeneousTimeSeriesCV · scikit-learn/scikit-learn#6322

(23 评论) (0 反应) (0 负责人)Python (27,020 fork)batch import

ModerateNew Featurehelp wantedmodule:model_selection

仓库指标

Star: (66,084 star)
PR 合并指标: (平均合并 10天) (30 天内合并 90 个 PR)

描述

I get this asked about once a day, so I think we should just add it. Many people work with time series, and adding cross-validation for them would be really easy. The standard strategy is described for example here

There are basically two cases: homogeneous time series (one sample every X seconds / days), or heterogeneous time series, where each sample has a time stamp.

For the homogeneous case, we can just put the first n_samples // n_folds in the first fold etc, so it's a very simple variation of KFold. Fixed in #6586.

For heterogeneous case, we need to get a labels array and split accordingly. If we cast that to integers, people could actually provide pandas time series, and they would be handled correctly (they will be converted to nanoseconds).

I remember arguing against this addition, but I changed my mind ;)

贡献者指南

研究方向: TimeSeriesCV 的同质情况已在 PR #6586 中实现，可能已合并。剩余工作是异质情况，这需要一个 `labels` 数组并基于时间戳进行分割。调查当前代码库以确认同质实现的现状，并遵循现有的交叉验证模式设计异质 TimeSeriesCV 类。查看评论中的讨论以了解任何额外要求或限制。
技术栈: python
领域: machine learning
议题类型: 功能
难度: 2
预计时间: 1-2 天
活动状态: 活跃
清晰度: 清晰
前置要求: PythonGit
新手友好度: 70

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。