python - How to deal with the missing data in stan? -
i newbie stan , implementing probabilistic matrix factorization model.
given user-item rating matrix:
item user 1 3 na 4 5 na 2 0 3 na 1 5 1 1 na na na 0 .... how should represent observable data in data block , missing data prediction in parameter block?
thank in advance!
edit:
now implementing model below:
pmf_code = """ data { int<lower=0> k; //number of factors int<lower=0> n; //number of user int<lower=0> m; //number of item int<lower=0> d; //number of observation int<lower=0> d_new; //number of pridictor int<lower=0, upper=n> ii[d]; //item int<lower=0, upper=m> jj[d]; //user int<lower=0, upper=n> ii_new[d_new]; // item int<lower=0, upper=n> jj_new[d_new]; // user real<lower=0, upper=5> r[d]; //rating real<lower=0, upper=5> r_new[d_new]; //pridict rating } parameters { row_vector[k] i[m]; // item profile row_vector[k] u[n]; // user profile real<lower=0> alpha; real<lower=0> alpha_i; real<lower=0> alpha_u; } transformed parameters { matrix[n,m] i; // indicator variable <- rep_matrix(0, n, m); (d in 1:d){ i[ii[d]][jj[d]] <- 1; } } model { (d in 1:d){ r[d] ~ normal(u[jj[d]]' * i[ii[d]], 1/alpha); } (n in 1: n){ u[n] ~ normal(0,(1/alpha_u) * i); } (m in 1:m){ i[m] ~ normal(0,(1/alpha_i) * i); } } generated_quantities{ (d in 1:d_new){ r_new[d] <- normal(u[jj_new[d]]' * i[ii_new[d]], 1/alpha); } } """ but got no matches for: real ~ normal(matrix, real) error in line of code:
for (d in 1:d){ r[d] ~ normal(u[jj[d]]' * i[ii[d]], 1/alpha); } but jj[d] should integer, denoting id of user. , u[int] should row_vector has k factors , i[ii[d]]. product of them should single real value, why stan said matrix?
there's chapter in stan manual on how deal missing or sparse data. in case, it's missing data. want put in long form (what r's reshape package calls melted form):
int<lower=0> i; // number of items int<lower=0> j; // number of users int n; // number of observations int<lower=1, upper=i> ii[n]; // item int<lower=1, upper=j> jj[n]; // user int<lower=0, upper=5> y[n]; // rating
then, each observation n, have user jj[n] assigning rating y[n] item ii[n].
there's example of in irt models in regression section of manual. have ordinal outcome, bit trickier. either direct ordinal logistic of kind, hierarchical, or try factor model (like partial svd used netflix). there example of factor models in manual --- you'd use generate linear predictor ordinal regression.
then, if want predict y[m] new combination of item i , user j, can in generated quantities block posterior predictive quantity. , can either via sampling or via expectation; there's example of in change-point model in latent discrete parameter chapter , in regression chapter on prediction.
Comments
Post a Comment