python - Pass Pandas DataFrame to Scipy.optimize.curve_fit -


i'd know best way use scipy fit pandas dataframe columns. if have data table (pandas dataframe) columns (a, b, c, d , z_real) z depends on a, b, c , d, want fit function of each dataframe row (series) makes prediction z (z_pred).

the signature of each function fit

func(series, param_1, param_2...) 

where series pandas series corresponding each row of dataframe. use pandas series different functions can use different combinations of columns.

i've tried passing dataframe scipy.optimize.curve_fit using

curve_fit(func, table, table.loc[:, 'z_real']) 

but reason each func instance passed whole datatable first argument rather series each row. i've tried converting dataframe list of series objects, results in function being passed numpy array (i think because scipy performs conversion list of series numpy array doesn't preserve pandas series object).

your call curve_fit incorrect. the documentation:

xdata : an m-length sequence or (k,m)-shaped array functions k predictors.

the independent variable data measured.

ydata : m-length sequence

the dependent data — nominally f(xdata, ...)

in case independent variables xdata columns d, i.e. table[['a', 'b', 'c', 'd']], , dependent variable ydata table['z_real'].

also note xdata should (k, m) array, k number of predictor variables (i.e. columns) , m number of observations (i.e. rows). should therefore transpose input dataframe (4, m) rather (m, 4), i.e. table[['a', 'b', 'c', 'd']].t.

the whole call curve_fit might this:

curve_fit(func, table[['a', 'b', 'c', 'd']].t, table['z_real']) 

here's complete example showing multiple linear regression:

import numpy np import pandas pd scipy.optimize import curve_fit  x = np.random.randn(100, 4)     # independent variables m = np.random.randn(4)          # known coefficients y = x.dot(m)                    # dependent variable  df = pd.dataframe(np.hstack((x, y[:, none])),                   columns=['a', 'b', 'c', 'd', 'z_real'])  def func(x, *params):     return np.hstack(params).dot(x)  popt, pcov = curve_fit(func, df[['a', 'b', 'c', 'd']].t, df['z_real'],                        p0=np.random.randn(4))  print(np.allclose(popt, m)) # true 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

post - imageshack API cURL -

dataset - MPAndroidchart returning no chart Data available -