Supporting multi-actuator drives
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
In a combined filesystem and storage session at the 2018 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM), Tim Walker asked for help in designing the interface to some new storage hardware. He wanted some feedback on how a multi-actuator drive should present itself to the system. These drives have two (or, eventually, more) sets of read/write heads and other hardware that can all operate in parallel.
He noted that his employer, Seagate, had invested in a few different technologies, including host-aware shingled magnetic recording (SMR) devices, that did not pan out. Instead of repeating those missteps, Seagate wants to get early feedback before the interfaces are set in stone. He was not necessarily looking for immediate feedback in the session (though he got plenty), but wanted to introduce the topic before discussing it on the mailing lists. Basically, Seagate would like to ensure that what it does with these devices works well for its customers, who mostly use Linux.
![Tim Walker [Tim Walker]](https://web.archive.org/web/20240330172510im_/https://static.lwn.net/images/2018/lsf-walker-sm.jpg)
The device is a single-port serial-attached SCSI (SAS) drive, with the I/O going to two separate actuators that share a cache, he said. Both actuators can operate at full speed and seek independently; each is usable on a subset of the platters in the device. This is all based on technology that has already been mastered; it is meant to bring parallelism to a single drive. The device would present itself as two separate logical unit numbers (LUNs) and each of the two actuator channels would map to its own LUN. Potential customers have discouraged Seagate from making the device be a single LUN and opaquely splitting the data between the actuators behind the scenes, Walker said.
One problem Walker foresees is that management commands, in particular those that affect the LUN as a whole, such as start and stop commands, could come addressed to either LUN but would affect the entire drive, thus the other LUN. Hannes Reinecke said that it would be better to have a separate LUN that was only for management commands rather than accepting management commands on the data LUNs. If not, though, making the stop commands do what is expected (park the heads if it is just for one LUN or spin down the drive if it is for both) would be an alternative.
Fred Knight said that storage arrays have been handling this situation for years. They have hundreds of LUNs and have just figured it out and made it all work. He noted that, even though it may not be what customers expect, most storage arrays will simply ignore stop commands. The kernel does not really distinguish between drives and arrays, Martin Petersen said; there really is no condition where the kernel would want to stop one LUN and not the other. Knight said that other operating systems will spin down a LUN for power-management reasons, but that the standards provide ways to identify LUNs that are tied together, so there should not be a real problem here.
Ted Ts'o said that a gathering like LSFMM (or the mailing lists) will not provide the full picture. Customers may have their own ideas about how to use this technology; the enterprise kernel developers may be able to guess what their customers might want to do, but that is only a guess. For the cloud, there is an advisory group that will give some input, he said, but it may be harder to get that for enterprises. Ric Wheeler said that he works for an enterprise vendor (Red Hat), which has internal users of disk drives (Ceph and others) that have opinions and thoughts that the company would be willing to share.
From the perspective of a filesystem developer, all of what is being discussed is immaterial; the filesystem developers "don't care about any of this", Dave Chinner said. The storage folks will figure out how and when drives spin up and down (and other things of that nature), but the filesystems will just treat the device as if it were two entirely separate devices. Knight pointed out that there are some different failure modes that could impact filesystems; if the spindle motor goes, both drives are lost, while a head loss will lead to inaccessible data, but that may just be handled with RAID-5, for example.
Ts'o noted that previously there had been "dumb drives and smart arrays", but that now we are seeing things that are between the two. Multi-actuator drives as envisioned by Seagate are just the first entrant; others will undoubtedly come along. It would be nice to standardize some way to discover the topology (spindles, heads, etc.) for these. Wheeler added that information about the cache would also be useful.
This device has a shared cache, but devices with split caches might be good, Reinecke said. Kent Overstreet worried that there could be starvation problems if there are different I/O schedulers interacting in the same cache. As time wound down, Walker said that the session provided him with exactly the kind of feedback he was looking for.
Index entries for this article | |
---|---|
Kernel | Block layer |
Conference | Storage Filesystem & Memory Management/2018 |
(Log in to post comments)
Supporting multi-actuator drives
Posted May 15, 2018 23:25 UTC (Tue) by willy (subscriber, #9762) [Link]
They may want to investigate some cache management techniques used by CPUs. I can't be the only one who sees the parallels between two hyperthreads sharing a CPU cache and two heads sharing a cache.
In particular, starvation of one head's cache by the other head is something they should be wary of. You probably don't want a strict partitioning, but reserving two of eight ways for each head and letting the other four ways float between the two based on demand might be a reasonable solution (for a hypothetical eight-way cache)
Supporting multi-actuator drives
Posted May 16, 2018 2:32 UTC (Wed) by dgc (subscriber, #6611) [Link]
Yes, RAID setups would still have to be taught not to assign both actuators in a single failure domain (e.g. spindle or power loss w/ RAID 1 or 5 means double disk failure and unrecoverable data), but otherwise they could be considered completely independent LUNs.
-Dave.
Supporting multi-actuator drives
Posted May 16, 2018 3:14 UTC (Wed) by JdGordy (subscriber, #70103) [Link]
Supporting multi-actuator drives
Posted May 16, 2018 5:01 UTC (Wed) by EdwardConnolly (guest, #123865) [Link]
Supporting multi-actuator drives
Posted May 16, 2018 13:14 UTC (Wed) by sjfriedl (✭ supporter ✭, #10111) [Link]
Supporting multi-actuator drives
Posted May 16, 2018 13:25 UTC (Wed) by Sesse (subscriber, #53779) [Link]
Another case: Either of the spindles die, but you could still be using the other one just fine. (This can also be done in other ways, but it's easier to manage “this LUN is dead” than “this byte range is dead”, at least with current tools.)
Supporting multi-actuator drives
Posted May 16, 2018 17:01 UTC (Wed) by flussence (subscriber, #85566) [Link]
Supporting multi-actuator drives
Posted May 16, 2018 7:50 UTC (Wed) by epa (subscriber, #39769) [Link]
The cache on the disk itself seems like a pretty small player when compared to the RAM in the system, so I don't see how the effects caused by sharing it between two disks could be significant.
Supporting multi-actuator drives
Posted May 16, 2018 11:18 UTC (Wed) by bokr (subscriber, #58369) [Link]
to minimize rotational latency, which in turn will depend on how
long it takes for an actuator to settle on a given track/cylinder.
So if OS drivers want to play in optimizing access patterns for
different applications, ISTM they will need access to the same
sensor data about spindle and actuator states as the disk's
firmware has, and will need a protocol to override the firmware's
model of optimum data streaming to and from different heads
(which may for starters just be to appear as a competitively
fast black box compatible with ordinary drivers).
So the next problem becomes social engineering, to help
the manufacturer see bigger disk sales and profit in allowing
free/libre driver writers access to the necessary data to do
wild things that a single closed team of firmware writers can
never think of.
Presumably some big commercial users of disk data will see
opportunities for optimizing their software, given firmware
SDK info for the disk, whether their game is SQL or
movie editing, or streaming live, or gaming, etc., and will try
to have it for themselves exclusively, for temporary market
advantage, so history will presumably replay itself unless
the disk mfgr mgmt is unusually enlightened (and has the
independence to resist NDA ploys and economic bullying).
IWG that optimization would have to consider one actuator's activity's
effect on another -- a kind of multi-body physics problem?
It is interesting to speculate what effect a fixed rotational delay
for read-modify-replace (if separate actuators are positioning
heads in the same cylinders) will do for the inventive.
Supporting multi-actuator drives
Posted May 17, 2018 4:54 UTC (Thu) by gwolf (subscriber, #14632) [Link]
> to minimize rotational latency, which in turn will depend on how
> long it takes for an actuator to settle on a given track/cylinder.
You said this and I pictured the original (and failed) LISA disks...
Supporting multi-actuator drives
Posted May 16, 2018 16:15 UTC (Wed) by yootis (subscriber, #4762) [Link]
This is more like having two drives in one enclosure, with a shared motor and cache. Wouldn't it be more useful to have two completely separate heads which can each read from all the platters? That would double bandwidth, reduce latency, and both sets of heads would be able to read the same data.
Supporting multi-actuator drives
Posted May 16, 2018 16:32 UTC (Wed) by antiphase (subscriber, #111993) [Link]
Supporting multi-actuator drives
Posted May 16, 2018 21:19 UTC (Wed) by nix (subscriber, #2304) [Link]
Supporting multi-actuator drives
Posted May 24, 2018 11:52 UTC (Thu) by Wol (subscriber, #4433) [Link]
Raid-5 is a *LOT* safer than a single disk, if managed properly. There are situations where it is appropriate, and situations where it is not. What would YOU use if your priorities were "available disk space" and you only had three spare SATA ports?
Cheers,
Wol
Supporting multi-actuator drives
Posted May 17, 2018 17:10 UTC (Thu) by excors (subscriber, #95769) [Link]
It sounds like these MADs are for people who care about IOPS (not bandwidth) and capacity and cost (so they can't afford SSDs). I guess having two heads at different locations on the platter would improve latency/IOPS since it takes less time for the requested data to reach the nearest head, vs Seagate's version where you can do more head movements per second than a single-actuator drive but might still have to wait for a full rotation of the platter; but it sounds more difficult and expensive to manufacture, so maybe not worth the tradeoff.
Supporting multi-actuator drives
Posted May 21, 2018 15:45 UTC (Mon) by HIGHGuY (subscriber, #62277) [Link]
There appear to be all sorts of difficult problems attached to having 2 separate sets of heads working on 1 set of platters. You can be sure that if it could be done easily, we would have had drives with 4 sets of heads already... (Unfortunately, I can't recall the exact reasons anymore)
Supporting multi-actuator drives
Posted May 21, 2018 17:36 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
Supporting multi-actuator drives
Posted May 31, 2018 17:15 UTC (Thu) by dfsmith (guest, #20302) [Link]
See the 1994 Conner Chinook [1] drive, or the IBM Hammerhead DASD [2]. Also see various split actuator designs over the years.
The basic problem is the heads and flex are the expensive parts; and they get doubled. Bring manufacturing economies of scale into the picture, and it nearly always ends up that buying two commodity drives is less expensive than a single higher performance drive.
[1] https://commons.wikimedia.org/wiki/File:Conner_Peripheral...
[2] So unpopular, apparently, that I was unable to find a picture.
Supporting multi-actuator drives
Posted May 16, 2018 19:43 UTC (Wed) by jthill (subscriber, #56558) [Link]
It would be nice to standardize some way to discover the topologyYes.
Seems to me allowing the OS to make one of the actuators cover only a relatively narrow range and dedicate that to uncached log-writing might be worth exploring. Hardwiring the actuator range boundaries looks like premature optimization, there's lots of other ways the OS knows better what needs doing first too, e.g. why not let the OS auto-stripe long sequential reads?
SMR is going away?
Posted May 17, 2018 19:03 UTC (Thu) by pr1268 (subscriber, #24648) [Link]
...Seagate, had invested in a few different technologies, including host-aware shingled magnetic recording (SMR) devices, that did not pan out.
Does that mean that SMR drives are a passing (and perhaps gone forever) fad? (I mean, besides the idea that storage devices with spinning metal are headed for extinction.)
SMR is going away?
Posted May 17, 2018 20:47 UTC (Thu) by jake (editor, #205) [Link]
No, I don't think so, or at least that's not what he meant here, aiui. It is the "host-aware" part that he was referring to, I believe.
jake
SMR is going away?
Posted May 21, 2018 15:58 UTC (Mon) by HIGHGuY (subscriber, #62277) [Link]
It's difficult to predict where SMR is going. There are definitely actual users for the technology and I'm sure that HDD firmware will reach a point where SMR will perform similar to PMR drives for most workloads (if they don't already). But it seems that whenever SMR drives come out with a higher density, PMR drives with the same increase are right around the corner.
Also with the advent of HAMR/MAMR it's anyone's guess what it will do to SMR versus PMR.
Apart from that, you might want to replace 'spinning metal' with 'spinning glass' since newer disks use glass substrates, IIRC ;)
Still, I don't think spinners are going anywhere yet, they're just moving into high-density low-performance low-cost storage. If you look at the quarterly reports from the HDD companies, the number of petabytes shipped every year is only going up.