algorithm - Calculate "moving" Covariance -


i've been trying figure out how efficiently calculate covariance in moving window, i.e. moving set of values (x[0], y[0])..(x[n-1], y[n-1]) new set of values (x[1], y[1])..(x[n], y[n]). in other words, value (x[0], y[0]) gets replaces value (x[n], y[n]). performance reasons need calculate covariance incrementally in sense i'd express new covariance cov(x[1]..x[n], y[1]..y[n]) in terms of previous covariance cov(x[0]..x[n-1], y[0]..y[n-1]).

starting off naive formula covariance described here:

[https://en.wikipedia.org/wiki/algorithms_for_calculating_variance#covariance][1]

all can come is:

cov(x[1]..x[n], y[1]..y[n]) = cov(x[0]..x[n-1], y[0]..y[n-1]) + (x[n]*y[n] - x[0]*y[0]) / n - avg(x[1]..x[n]) * avg(y[1]..y[n]) + avg(x[0]..x[n-1]) * avg(y[0]..y[n-1]) 

i'm sorry notation, hope it's more or less clear i'm trying express.

however, i'm not sure if sufficiently numerically stable. dealing large values might run arithmetic overflows or other (for example cancellation) issues.

is there better way this?

thanks help.

it looks trying form of "add new value , subtract old one". correct worry: method not numerically stable. keeping sums way subject drift, real killer fact @ each step subtracting large number large number small number.

one improvement maintain sums (of x_i, y_i, , x_i*y_i) independently, , recompute naive formula them @ each step. running sums still drift, , naive formula still numerically unstable, @ least have one step of numerical instability.

a stable way solve problem implement formula (stably) merging statistical sets, , evaluate overall covariance using merge tree. moving window update 1 of leaves, requiring update of each node leaf root. window of size n, method take o(log n) time per update instead of o(1) naive computation, result stable , accurate. also, if don't need statistics each incremental step, can update tree once per each output sample instead of once per input sample. if have k input samples per output sample, reduces cost per input sample o(1 + (log n)/k).

from comments: wikipedia page reference includes section on knuth's online algorithm, is relatively stable, though still prone drift. should able comparable covariance; , resetting computation every k*n samples should limit drift @ minimal cost.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -