| Home
-
Hints
& Prejudices -
Always Log
Always log spot intensities and
ratios
Why? Because
it
-
Makes variation of intensities and ratios of intensities more independent
of absolute magnitude.
-
Makes normalization additive.
-
Evens out highly skew distributions
-
Gives a more realistic sense of variation
Comments:
Logs base
2 are the easiest for people (intensities are typically a number
between 0 to 65535).
In a context
like this, a single standard deviation (SD) can be given the usual
interpretation only if
(a) the
distribution is approximately normal, and
(b) variation
is independent of magnitude.
Neither of these is true for unlogged intensities or ratios.
Compare
the histograms for unlogged and logged intensities and ratios (Figure 1).
Figure 1
Variation
of log intensities is still not constant: a noticeable decrease as magnitude
increases remains evident in most microarray data (Figure 2).
When we
have two sets of numbers such as R and G varying over a large range, it
is useful to compare log R with log G by plotting their difference log(R/G)
against their average (1/2)log R*G. Doing this we might see
something unexpected. By contrast, plotting R against G is typically much
less revealing and can give a quite unrealistic sense of concordance (Figure
2).
Figure 2
Further comments. As noted above,
log ratios still have a spread which tends to decrease as average log intensity
increases. This suggests that a different transformation of the ratio might
achieve a more constant scatter (at least as a function of average log
intensity). This is indeed the case: a ratio of 4th or 5th roots does
seem to have a more constant scatter (about 1). However, we feel this is
a small gain compared to the simplicity (adding/subtraction logs corresponding
to multiplying/dividing) lost if logs are not used. For example, normalization
is no longer straightforward. We will stick with logs until a more compelling
reason comes along.
To
top
last updated March 07, 2000
zarray@stat.berkeley.edu
|