machine learning - features' range in logistic regression -
i use logistic regression. know supervised method , needs calculated feature values both in training , test data. there 6 features. although functions produce these features’ values different , maximum value can 1, there 4 features (both in training , test data) have low values. e.g. range between 0 , 0.1 , never 1, more 0.1!!!. these features’ values close each other. other features distributed (they range between 0 , 0.9). difference between these 2 kinds of features high, think causes trouble in learning process logistic regression. right?! need transforming/normalizing these features? highly appreciated.
in short: should normalize features prior training. typically - each either in range (like [0,1]) or whitened (mean 0 , std 1).
why important? in order make "small" features important lr need high weights in dimension. however, use regularized lr (typically l2 regularized) - in such case hard assign high values these vectors, regularization penalty force model rather choose equally distributed weights instead - use normalization. however - if fit lr without regularization, there no point in scaling (up numerical errors) lr not depend on choice of scaling (the solution should same)
Comments
Post a Comment