[Tux3] Challenge: Make Tux3 work well with flash disks
Daniel Phillips
phillips at phunq.net
Sun Feb 15 14:44:37 PST 2009
Hi all,
Please see this well written analysis of performance loss as a
new-generation Intel flash disk "ages":
http://www.pcper.com/article.php?aid=669
"Long-term performance analysis of Intel Mainstream SSDs"
Though I have not really analyzed the issues completely at this time, I
have the feeling Intel made a slight mistake in the way they combine
writes. I think that what they do is this: they have a "current" flash
block, which starts fully erased, then each write transfer is appended
until it is full. So writes are combined in write order, which is a
lot like the deduplication plan the Pune Institute students are
pursuing. The bucket idea is likely to have advantages and drawbacks
similar to Intel's SSD write strategy.
The problem in both cases is the effect of rewrites, which cause data to
be relocated away from its original position, leaving holes at the
original position. This may not be as big a problem with deduplication
if the target application is mainly archive, but it is a serious and
visible problem with a flash device that intends to act like a disk
drive.
What happens is, when Intel's disk fills and ages, the best candidate
block for erasing will have a high percentage of valid data on it,
which has to be copied to a new location. The performance of the disk
under a steady write load will thus drop to a fraction of the erase
speed, because a portion of data recovered by erasing has to be used to
store valid data relocated from candidate erase blocks.
If my understanding of the issue is correct, then the big problem is
that Intel relies only on order written to decide how data should be
grouped together on flash blocks. The grouping really needs to
incorporate spatial adjacency as well, to maximize the chance that an
entire flash block or at least a large portion of it will be rewritten
in future, thus lowering the portion of data that has to be relocated.
One piece of this story I have not figured out yet, is why combining
writes is a big performance win for the Intel flash disk. I suspect
that it actually is not a big advantage, and that this technique was
just the easiest thing to implement. On an initially empty drive, it
benchmarks well, just as our current next-available allocation policy
will perform well initially, and steadily worsen as the filesystem
ages.
I hope somebody will eventually enlighten me about whether there is some
other advantage to write combining that I have not yet perceived.
Until that happens, I am proceding on the assumption that Intel's
strategy is suboptimal and will soon need to be improved to avoid
further criticism of long term performance characteristics.
Anyway, my tentative conclusion is that flash disk will not in fact
completely liberate filesystem designers from issues of spatial
organization: Intel will ultimately be forced to redesign their flash
write algorithms and filesystem designers will need to keep thinking
about layout issues. In other words, as the world moves to solid state
storage, the importance of spatial optimization will not be reduced,
only the parameters of the problem are changed.
For flash, even though seeking is not a problem, we still need to try to
maximize the likelihood that physically adjacent data is rewritten at
the same time. This assumes that Intel well modify their write
algorithm to rely on that, which looks like a pretty safe bet right
now. I think we are talking about performance differences approaching
an order of magnitude between the best and worst algorithms, making
this an important issue that will only get more important.
To be sure, we have more pressing issues than flash performance just
now. However I would like to see our thinking on this subject progress
in the background as we work on other things. Anybody who wants to
jump in at this point, please do.
Regards,
Daniel
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3
More information about the Tux3
mailing list