microsoft/SynapseML

Featurizer should provide option to pass through missing values as Double.NaN instead of removing rows (currently the default)

Open

#304 opened on May 18, 2018

View on GitHub
 (5 comments) (0 reactions) (1 assignee)Scala (5,228 stars) (861 forks)batch import
enhancementgood first issuehelp wanted

Description

Hi! Using lightGBM I faced another problem. I'm not sure if it is bug or feature :) but in our data we have a lot of empty values, so before we used sparse vector to store features, and it worked fine with our previous lib. But when i tried to use featurizer, that you provide - i mentioned, that you skip all raws if any nulls are presents as a feature. you can see it in example in attachment. So is it possible to have sparse feature vector for lightGBM training?

https://gist.github.com/ekaterina-sereda-rf/929183b9bcbbf5baf15eec3e81329992

Contributor guide