I was challenged offline last night on the speed-up of pthread over fork(). It was just the backdrop...but brought my attention to average processing time (
tav) in pixlserv. I never seriously looked at its calculation before. I do recall during my recent long run
tav seldom move a tiny bit most of the time!
Here is the original
code. It's trying to calculate a moving average. It basically says this:
new average process time = old average process time + (current process time - old average process time) / tct.
where
tct accumulates the number of data points used in the entire history of calculation. The 0.5 in the code is for rounding.
pipe data.run_time has the current process time.
The code implements some sort of
exponential moving average (but not exactly is). The factor 1/
tct appears like the alpha coefficient however it's monotonically increasing. And that's the problem!
If we take the limit on
tct (to infinity), the second term in the above pseudo code will tend towards zero. New average process time will be equal to old average process time. That is when you run pixelserv long enough, average process time will stuck and seldom move a bit.
Imagine
tct accumulates to 1 million. Current process time need to last 1 million milli-seconds for the second term to be close to 1 and move 1 ms in the new average process time. That's insane and will never happen. The longer you run the harder to move the needle a tiny bit.
But how long is long enough to see this "stuck" effect? It doesn't require that long actually...by the fact that in the code right-hand side using floating point calculation but truncated into an integer (
tav) for store on the left hand side. With trial and error, we could estimate about
100 data points will pretty much freeze the average process time at its last value. Any new data points will hardly have effect. So after the first 100 requests to pixelserv, the average process time is pretty much set in stone.
The
average byte per request (
avg) has the same issue.
I'm going to change both calculations to an exponential moving average with a coefficient of 1/10 for
tav and 1/100 for
avg. For the EMA to do well, we need floating point for persistent store but convert to integer for presentation. Will be glad to hear better ideas.