python - Pass Pandas DataFrame to Scipy.optimize.curve_fit -
i'd know best way use scipy fit pandas dataframe columns. if have data table (pandas dataframe) columns (a
, b
, c
, d
, z_real
) z depends on a, b, c , d, want fit function of each dataframe row (series) makes prediction z (z_pred
).
the signature of each function fit
func(series, param_1, param_2...)
where series pandas series corresponding each row of dataframe. use pandas series different functions can use different combinations of columns.
i've tried passing dataframe scipy.optimize.curve_fit
using
curve_fit(func, table, table.loc[:, 'z_real'])
but reason each func instance passed whole datatable first argument rather series each row. i've tried converting dataframe list of series objects, results in function being passed numpy array (i think because scipy performs conversion list of series numpy array doesn't preserve pandas series object).
your call curve_fit
incorrect. the documentation:
xdata : an m-length sequence or (k,m)-shaped array functions k predictors.
the independent variable data measured.
ydata : m-length sequence
the dependent data — nominally f(xdata, ...)
in case independent variables xdata
columns d, i.e. table[['a', 'b', 'c', 'd']]
, , dependent variable ydata
table['z_real']
.
also note xdata
should (k, m) array, k number of predictor variables (i.e. columns) , m number of observations (i.e. rows). should therefore transpose input dataframe (4, m) rather (m, 4), i.e. table[['a', 'b', 'c', 'd']].t
.
the whole call curve_fit
might this:
curve_fit(func, table[['a', 'b', 'c', 'd']].t, table['z_real'])
here's complete example showing multiple linear regression:
import numpy np import pandas pd scipy.optimize import curve_fit x = np.random.randn(100, 4) # independent variables m = np.random.randn(4) # known coefficients y = x.dot(m) # dependent variable df = pd.dataframe(np.hstack((x, y[:, none])), columns=['a', 'b', 'c', 'd', 'z_real']) def func(x, *params): return np.hstack(params).dot(x) popt, pcov = curve_fit(func, df[['a', 'b', 'c', 'd']].t, df['z_real'], p0=np.random.randn(4)) print(np.allclose(popt, m)) # true
Comments
Post a Comment