StorageTek SL8500 tape library robotics retrofit

Why ?

Pourquoi ?




STK were the best with their ACS 44xx (Cimarron, Powder Horn, ...). Engineers coming from IBM, they initially wanted to make tape drives, almost went bankrupt, then recovered (at that moment the number of librairies orders from customer saved them)


https://www.storagenewsletter.com/2018/06/22/history-1987-stk-4400-automated-tape-cartridge-system/


This dodecagon-shaped machine is absolutely legendary.


It had real video cameras (which disappeared on the SL8500), a double gripper, and halogen lighting!

https://www.youtube.com/watch?v=FBEwcN_gSAk


This construction time-lapse is so cool (If you have news from Andrew Austin, Wayne Brummel, Douglas Luebke or Rory Burdine, tell me)

https://www.youtube.com/watch?v=GG-wYzVx1gA



It even appeared in several films: notably The Eraser with Schwarzenegger (but modified machine, if someone have more information please e-mail me, email at the end)

It’s one of the machines with the longest life cycle in computing, about 20 years from ~1985 to ~2005. (if it's no simply the longest life cycle ever (all hardware types mixed)!)

Originally sold with a total capacity of 1.2 TB, it ended up multiplied by x1000, reaching 1.2 PB!




The machine was system-agnostic; it could serve at the same time one or more mainframes, Unix servers, and other infrastructures. Each participant had their own drives, their media partition, and made their load requests…


In our retrofit, we are currently at just over 90 exchanges/hour, simply because we haven’t started tuning yet (PID, voltage still at 12 V for now—we’re just being conservative! 😉).


Bought by Sun, then Sun by Oracle, nothing remains…


The SL8500 is the successor to the ACS, ~2005 → ~2025. Even though it’s still in Oracle’s catalog, there’s nobody there to help. Starting with a capacity similar to the ACS at it's end, it will end at around 10,000 × 18 TB—a very respectable factor. (but could/can still end much later, don't you think?)


However, the promise of a successor, of endless upgradability after the SL8500, has been broken:


https://www.hpcwire.com/2020/10/31/at-oak-ridge-end-of-life-sometimes-isnt/


IBM was really lagging in the eighties they tried this almost awful library with a 6-axis arm (28 meters long):


https://www.youtube.com/watch?v=GwMn7YpF8r8


https://aussiestorageblog.wordpress.com/2021/04/30/remembering-the-time-a-giant-yellow-robot-saved-ibms-tape-business/


  • no redundancy
  • 300 kg robot to move a 300 g tape?
  • the robot on a car, even though it’s a robot meant for painting cars (normally)?
  • a huge joke in the end


    author’s note: I focused on making the same mistake all by myself for years, huh 😉 it was blue, not yellow... But I wasn’t an army of IBM engineers, did not not know about giant yellow fail, neither my friends, and I had more right to make this mistake too? huh 😉


    But with 3.5'' HDD
    At least the robot could reach it's nominal payload (20kg): the move of the entire shelf was prototyped (I have no idea of Spectra Logic TeraPack then, but same idea?, not for density, but for faster relocation, a paletization program could have been developped, note that the same tool can move one drive, and that the gripper can go through the shelf, open, and the linear pneumatic actuator which serve normally for anti-collision, is used to lock the shelf, so I cannot be so much laught at (300kg for a 300g tape ;))


    IBM eventually moved toward more modular stuff. But still not as good as STK:

    - IBM libraries can’t be placed flush against a wall or between two walls to be installed side by side (from the inside), with only 10 cm clearance, like the SL8500: up to 10 units for a total of 100,000 media, have not 4 floors with 2 elevators, a robot that can be taken out by unscrewing 2 screws and shipped easely,... (Spectra, Quantum neither):

    https://www.youtube.com/watch?v=X5UCfU9Q-iA&t=730s


    The rest of the video is interesting because it explains:

  • robot redundancy: one robot can push its twin off its floor if it breaks down
  • dual format (LTO and T10K) which enables this adventure of a cartridge containing a 2.5″ HDD
  • pass-through tape between units

    One of the major problems in the SL8500 is its robot power rails, and since communication was done via power line carrier, the robots could literally disappear. Our retrofit option was to add a battery; for communication, Wi-Fi is currently used but it’s not FINAL. Security isn’t up to standard in my opinion. The plan is to build an IR LED strip/photo-transistor system to ensure communication with the robots (with multiplexed tansmitters/receivers).


    Today

    Only Americans remain in the market; Grau, a German company that in my opinion had great hardware, abandoned this branch:

  • IBM
  • Spectra Logic and their Tera-Pack (fun but I find it very complex, and their ad is misleading about space savings: they compared with an SL without any expansion chassis, so it mostly contained drives and it's access space in the front: https://www.youtube.com/watch?v=j3_nu3yw0qc&t=155s. We can believe their TeraPack increases density, but maybe by max x2, at the cost of lower access efficiency and plenty of other problems.
  • Quantum (but smaller than the 2 above)


    And tape?

    Many people today think tape is dead, but:

  • almost everyone ignores that >80% of the largest companies back up on this medium
  • Microsoft said tape was dead about 15 years ago, and a few years ago they made a 180° turn
  • the problem with tape is the initial investment: in the drives (~$5,000 each and at least two for redundancy) and also investment in a robot.


    Hard drives have reached their maximum data density; the latest disks use a laser for HAMR.

    Today, an LTO-9 tape of 18/45 TB costs ~$80.


    The 50 TB native has been announced:

    https://newsroom.ibm.com/2023-08-29-Fujifilm-and-IBM-Develop-50TB-Native-Tape-Storage-System,-Featuring-Worlds-Highest-Data-Storage-Tape-Capacity-1


    And in the lab, they’re having fun:

    https://www.fujifilm.com/au/en/business/data-management/datastorage

    580 TB on a tape? The thing is, a cartridge is ~1,000 m × 12.4 mm, so about 12 m². So we don’t have the same density problems as disks at all.


    Nearline®

    Funny term 😉 It was STK’s branded term at the time to say it wasn’t offline but fast enough that it could approach online!

    Hybrid offline HDD/LTO storage/(T10K not sure ;)

    The great thing with this library is it's ability to accept T10K cartdriges, so it's enough space to handle a 2.5 inches HDD with connecting stuff at the back. Weight is the same. But we had and are still designing the docking and the enclosure. (I was mistaken first, thinking the SL could also use IBM Jaguar - 3592, but they seems to fit also correctly and their dimensions were use for the HDD box)

    The box contains an active electronic USB3 MUX, so the docking can connect through pogo pins, or the user @home can connect his hard drive normally through the installed micro usb3 connector on the box.
    The docking was specified to fit in an 5 1/4'' slot and is made with alternating left/right,up/down roller bearing arm locking system, so they use very few space when installed side to side.

    We are not re-inventing RDX® RDX is a robust transportable box, our is not robust! But, our, is movable like a T10K or a Jaguar by the robot. Pogo-pins have a very good wear resistance, much better than the normal connectors.



    Open source project

    We would like to have this machine working for years without being tide to Oracle. It's "Why" we began this project. The mechanics are great, changing a roller bearing, even asking a local supplier to re-make a specific hardware part can cost far less than ordering the OEM part, if it's even on stock.

    The actual electronics and RP2040 code running the bots are out under GPL3, mainly for the curious. It's prototype, but after 2000 moves (loads and unloads) before any move errors, I had to stop the tests: wearing a bot, for nothing is useless. Mean exchanges/swaps between failures (MEBF/MSBF) of a bot is 2M according to the specs, wearing 0.1% of the bot for nothing, is really useless!

    The machine has been re-assembled, we plan to start real movement for real work this end of this 2025 summer.

    There's so much exciting things to do:

  • implement the audit/scanning: with the powerfull led lighting, we are actually at a so short shutter time that the audit seems to be doable at a much faster rate than the orignal one and we'll have a tiled high-res map at the end.
  • implement the CAP access
  • implement the elevators
  • implement the zone locking/exclusion when running with dual bot per floor
  • implement one bot pushing the other if it's out of order
  • implement and make the hardware for LDI (actually we have reconfigured the tape units to standalone, but it's not optimal) Thanks so much to AC7RNsphnHVbyT4
  • design and implement this idea of IR bandstrips for the communication
  • Recommended Access Order (RAO)
  • all we forget .... (security? ;)
    It's a kind of train model, but easier to rent to someone ;)

    The linux parts prototype scripts running on the raspberry: perl, python, shell that move, calibrate, read code-bars, and give remote access to a web browser (including the video feed-back from the head): we keep that for the moment. We would like, first, to know if there's some people there that have still not thrown this magnificent machine.

    We have time, but it's not infinite, we'll be able to run our setup but for a quality acceptable kit and code base retrofit, it can't be made without some funding or code contribution.

    Github

    If it's not late: don't landfill it

    I still don't understand why it's possible to dump one of this thing. Some even dumped several at once. When we see the price of the thing (or getting the same capacity from IBM, Spectra or Quantum which is quite the same). The work of designing a new electronic for a bot, some code to run it (at the actual state of the project) the actual cost is far less than a tenth of one machine. Imagine getting together to retrofit perahps 10 units?

    Please like the video and subscribe

    Forgot to ask, but we have lot's of history stuff and we'll publish content in the future.
    It's also by subscribing, that you won't loose up to date informations on that project.

    Support even more?

    You can even donate for the project if you think the project is worth it. We'll the be able to commit usable open-source code and real documentation.

    Side projects

    Eletronic retrofit forced me, at the very begining, to design a small DC brushless driver. It's in the github repo. Its power rating is overkill for our usage. (the drive is available on the github)
    The hall connection order is not the same for all the 6 motors. So we implemented a way to configure all the posibilities


    When talking about this problem with my friend JCZD, and after doing searches, we could not find an easy way to test all combination. I was thinking to use an off-the-shelf rotary contactor but could not find an affordable 3 way contactor with 6 possible positions.
    Second idea was to use a microcontroller but my friend goes with the mechanical way, like a multimeter rotary switch and comes out with this magnifient key, that when config is found, shows corresponding jumper configuration:


    Hall Key Project Github Repo

    #######                                    #
    #        #    #    ##       #    #        # #
    #        ##  ##   #  #      #    #         #
    #####    # ## #  #    #     #    #
    #        #    #  ######     #    #         #
    #        #    #  #    #     #    #        # #
    #######  #    #  #    #     #    ######    #
    
                     #####  #######   ###     ###
      ####   #      #     # #        #   #   #   #
     #       #      #     # #       # #   # # #   #
      ####   #       #####   #####  #  #  # #  #  #
          #  #      #     #       # #   # # #   # #
     #    #  #      #     # #     #  #   #   #   #
      ####   ######  #####   #####    ###     ###
    
     #####
    #     #
    # ### #
    # # # #
    # ####
    #     #
     #####
    
    
      ####     ##       #     #####     #     ####           #    #  ######   #####
     #        #  #      #       #       #    #               ##   #  #          #
      ####   #    #     #       #       #     ####           # #  #  #####      #
          #  ######     #       #       #         #   ###    #  # #  #          #
     #    #  #    #     #       #       #    #    #   ###    #   ##  #          #
      ####   #    #     #       #       #     ####    ###    #    #  ######     #