Z factor refactored
I recently reread the original Z factor paper (Zhang et al). The Z factor is a measure of assay reliability and comes in two flavors: the Z’ factor, based entirely based on controls (those with and without the desired effect); and the Z factor, based on experimental data compared with the controls that should have the desired effect.
Rereading a paper months later often makes you wonder whether you read the paper at all the first time. This reading really clarified for me what the Z factor is, that it is not just for high-throughput screening, and raised a number of questions (especially after discussion with colleagues) not addressed in the paper.
The Z factor is the ratio of the “separation band” of the data to the assay dynamic range. A picture helps:

where μ+ is the mean of the positive controls (in this case the controls with desired effect), μs is the mean of the data, σ+ is the standard deviation of the positive controls, etc. The assay dynamic range in this diagram is μ+ - μs. The screening window is then (μ+ - μs) - (3σ+ + 3σs), and the ratio of this to the dynamic range is the Z factor = 1 - (3σ+ + 3σs)/(μ+ - μs).
(If you’re reading this in an RSS reader, the story continues on my website.)
Zhang et al go on to describe desirable values of Z: Z = 1 is an “ideal assay” (the standard deviations are negligible compared to the difference between the means), 1 > Z ≥ 0.5 is an “excellent assay”, 0.5 > Z > 0 is a “double assay”, Z = 0 is a “yes/no assay” (no separation band, the two 3σ regions touch), and for Z < 0 “screening is essentially impossible”. Note that when the two distributions completely overlap (μs = μ+), then Z is -∞.
As I mentioned above, the Z factor’s usefulness is not restricted to high-throughput screening assays. Indeed, it can be applied to any assay that measures a number of experimental subjects in an identical way and measures control values. However, when discussing application to assay optimization, the paper does point out that use of the Z factor requires “relatively large data sets”.
Three major questions arise:
- What do these significant values of Z mean, especially 0.5? What is a “double” or “yes/no” assay?
- How large is a “relatively large data set”, i.e. how large does your data set need to be to use the Z factor?
- Why is the range of of Z -∞ to 1? Is there another parameter with a more intuitive range?
A nice companion reading for deeper understanding is the paper published in March 2007 by Sui and Wu. They take on question #1, by examining the statistical power of an assay at different Z factor values. The statistical power, in the case of a drug screening assay, is the probability that an active compound is scored as a hit (i.e. the probability of “true positives”). Z factor is calculated without referring to the hit threshold, the value beyond which a compound is scored as a hit. Often people score all outliers three standard deviations outside the mean as “hits” (in our diagram above, this would be any measurements falling above μs + 3σs, i.e. within the separation band or above). Sui and Wu show that if the standard deviations of sample data and controls are equal, then a Z factor of 0.5 corresponds to a statistical power of 0.999, i.e. there is only 0.1% chance that an active compound is not scored as a hit.
However, they also show that if the standard deviations are not equal, then interpreting the Z factor becomes considerably trickier. They also show that although the Z factor calculation does not necessarily rely on the error distributions being normal, for non-normal error distributions (where the sample (or the control data) is not well described by the normal distribution N(μs, σs2)) the Z factor does a poor job of describing the reliability of the assay, demonstrated by the fact that the Z factor is different if non-normally distributed data is transformed to be closer to normal.
Sui and Wu suggest caution when interpreting Z (and Z’) factor values, and recommend that analysts confirm normality of the data (transforming if necessary) and calculate the statistical power corresponding to the distributions of the sample/control data and the hit threshold to get a more reliable measure of assay reliability.
Humorous side note: Sui and Wu interpret “double assay” to mean “doable assay”. Who knows.
As for question #2 (”how large does a data set need to be to get a reliable estimate of Z factor?”), this can be calculated from the standard error of the estimators used to calculate the means and standard deviations. That is, when calculating Z factor, one can’t actually use the real means and standard deviations of the underlying distributions of the samples and controls, one can only estimate these quantities by making a number of measurements of the samples and controls. I plan to calculate these and publish them here at some point, but any statistician can do the same (probably more efficiently and correctly than I).
Finally, question #3: Zhang et al’s objective was to develop a dimensionless constant that took three distributional parameters into account: the difference of the means of the samples and controls, the variability of the controls and the variability of the sample data. There are other ways to combine these parameters into dimensionless constants that are different from the Z factor. For instance, one could calculate the ratio of the separation band to the sum of (3σ+ + 3σs), call this parameter C. C varies between -∞ and +∞, and if σ is the same for controls and samples, then C = -1 when Z = -∞, C = 0 when Z = 0, C = 1 when Z = 0.5, and C = +∞ when Z = 1.
This gets rid of the weird “0.5″ and the upper limit of 1, but I think that actually the Z factor is a more accurate reflection of reliability. The reason is that the reliability of an assay with Z of 0.9 vs. 1.0 is really quite small, even if the dynamic range in these two cases is very different, because 0.9 is already good enough. By contrast, the difference between the reliability of an assay with Z = 0 and Z = -∞ is huge, because we go from having the data variability ranges ([μ - 3σ, μ + 3σ] with appropriate subscripts) touch to having the two distributions completely overlap. By contrast C would only vary from 0 and -1 between these two cases.
I may retouch the explanation above for clarity at some point, especially if people ask questions.
November 13th, 2007 at 9:08 am
It looks like ROC curves would be useful for this.
November 26th, 2007 at 11:26 am
Probably so, thanks for your comment. I’ll play around with the idea. ROC curves are definitely an intuitive way to present sensitivity/specificity information.
April 29th, 2008 at 3:51 pm
We agree and really appreciate your discussion of the Z-factor. It seems that using a time dependent cell count to determine a K value may be a better descriptor. But I guess this contradicts the HTS philosophy.
April 29th, 2008 at 4:20 pm
Thanks for your comment, Neil. Could you expand more on your statement about time dependent cell counts? I’m interested.