I want to transfer 80 TB of data to another locatio . I already have the drives for it. The idea is to copy everything to it, fly it to the target and use or copy the data on/to the server.
What filesystem would you use and would you use a raid configuration? Currently I lean towards 8 single disk filesystems on the 10 TB drives with ext4, because it is simple. I considered ZFS because of the possiblity to scrub at the target destination and/or pool all drives. But ZFS may not be available at the target.
There is btrfs which should be available everywhere because it is in mainline linux and ZFS is not. But from my knowledge btrfs would require lvm to pool disks together like zfs can do natively.
Pooling the drives would also be a problem if one disk gets lost during transit. If I have everything on 8 single disks at least the remaining data can be used at the target and they only have to wait for the missing data.
I like to read about your opinions or practical experience with similar challanges.
No raid. Instead ship 2 or 3 copies of data spread across different storage devices.
Honestly, is tape still a thing? Because this is exactly what it was good at.
Tape is still a thing: Ultrion tapes store up to 40 TB. But the devices to read and write them are not priced for mortals.
For some reason, if I were doing the physical media route, I’d want to ship the drives via FedEx or something similar. Presumably this isn’t the only copy of the data. Even if you still need to go, just dragging these drives around seems risky.
Two LTO-10 tapes (and presumably a LTO-10 reader to copy them over because I don’t think the destination would have that)
Rsync with checksuming and respective mount options. What was it, 1 bit flip per 1 TB transfer?
That sounds scary and like I need at least btrfs if I need to ship the data instead of using
rsync.
If you’re flying with drives full of data, better encrypt the data first. I’d just use the drives as a backup target for borg backup. Then at the other end, restore everything. You might need a spare, empty drive to get that process going. Alternatively, use your favorite encrypted file system if you want to keep the data encrypted after arrival, maybe a good idea too.
Better plan some logistics for one or more drives failing during this process too. I assume you have an intact copy of the data at home. So you can get a new drive written and shipped to you if something goes wrong.
Why do you have to do all this in person anyway though? Can’t you ship drives and have someone at the other end install them in a box for you? For that matter, is 80TB really too much data to transfer by network? With a mere 1 gbit connection it’s about a week of transfer.
I wasn’t involved in the decision process to buy those drives and enclosures. Now they act as a backup, too.
I still don’t understand the bit about flying them somewhere. Where are they going? Bigger drives would mean fewer, too.
7 hard drives at 12TB each in your luggage?
More like 8x 10 TB drives.
I’d use XFS as it’s excellent at copying big files of data (7z. img/iso/qcow2, 4K Videos).
For large amounts of smaller files (Like photos, odt, and PDFs), I’d use Ext4.
I second XFS for large files.
Will the disks be permanently in-place there or are they just a means of transport? Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.
ZFS is up to the task if you can connect all the disks at the same time at the target location. You don’t really have to keep track of the order of the disks - ZFS will figure it out when mounting the pool. The act of copying the data from the disks will effectively perform a scrub at the same time.
If you will only attach one disk at a time, it is a bit more of a coin toss. Although - ZFS single disk volumes do support scrubbing as well.
Thinking about disk corruption in transit would be one of my worries - X-ray scans, vibration and just handling can do stuff with the bits. Tgz, zip or rar files with low or no compression can provide error detection, although low recovery. Checksum files can also help with detection. Any failed files can perhaps be transferred over the network for recovery.
Thx.
The disks are only meant for transport at this time.
The more I think about it, the more I lean towards btrfs, because even if they don’t use btrfs on the target server the copying process will do the error correction based on the checksums in btrfs itself. I hope btrfs does it the same way as ZFS in this scenario.
It’s a good idea to use what you know. I don’t have much experience with btrfs but if it does what it says on the tin then it should be safe to use.
Copying the contents at the target is a good strategy. If the drives are to be put into 27/7 use later I would probably consider wiping them and run an integrity test before putting them to use, as once they start being used it will be too late (and stay as a doubt in the back of my mind).
Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.
What? Lol no. They’ll travel fine.
Multiple disks with many moving parts, containing 80TB of data on magnetic platters flying at high altitude where they’ll be subjected to far more physical impacts, radiation, and cosmic rays than at sea level.
Yeah, it’s a risk.
You kids think HDDs just failed daily or something. I flew all over the place with a laptop with an HDD for years, as did many others. It’ll be fine. Especially since it’s unlikely they would be using the drives while traveling.
From a position of handling corporate data on a daily basis, I am pretty confident that data integrity is top of mind.
I agree with both of you. Somehow I don’t worry about the drive in my laptop but 80 TB of scientific data is another thing, and I want to make sure it is the same data when it arrives.
Really, then why is there an explicit SMART conveyance test?
It’s to test for damage that may have occurred during shipping.
And how often does it happen?
How do you ensure that is doesn’t happen? If this is corporate data that can be key.
this is scientific data.
Funfact, I recently did a scrub on my offline backup drive of my work PC. It correct around 250 errors. I wouldn’t have noticed any problems if I had used ext4 instead of btrfs.
Often enough that there’s a test designed to detect it specifically. If you want hard data you’ll have to find it on your own, I don’t have any handy.
I dont have the knowledge to help you. But I know enough to be intrigued by your usecase. Can you share what you are trying to do? Is it a corporate job? Or a personal collection or sth?
It is scientific data that needs to be available on another server.
Aah, interesting, havent considered this. Thanks!
btrfscan pool disks just fine. Create a RAID nice and quick.There’s also
btrfs sendandreceive. Which may be what you need for shipping the data? You can use SSH for a secure write…If this is a one-time copy, I’d strongly consider just syncing the data vs. shipping drives (which, as people have pointed out, may have serious reliabilty concerns).
Otherwise, if you must ship, I’d say the best move is two copies of each piece of data, so any single drive failing in shipping isn’t a big deal. But not a RAID. Just two literal copies on two separate drives. Simplest way to ensure some redundancy.
Yes, using
rsyncbetween the two servers would be the best option. I guess, despite I already have the drives. On my end I could provide the access and arrange proper security with VPN, but at the target there are still too many question marks and I cannot currently count on some basic Linux knowledge there.For a previous transfer of much less data I had to write a PS script that handled the transfer. It was very slow.
So, I am actually dealing with another problem: Can I get enough information from the non-tech persons to provide the best and easiest solution for them.
Thx so far all the ideas from all of you.
Not quite clear there…
You’re copying data from the source, to harddrives… and then to a server with different drives?
Assuming it’s just lots of smallish data files / media and not OS files (ie don’t need symlinks, attributes, ownership, etc) then any backup software which generates hashes to be able to repair the archive during a restore would do.
Btrfs doesn’t need LVM, but I wouldn’t use that on mobile drives.
Or… is this one huge 80TB file?
Your assumption is correct. These are many files of medium size: sat raster images.
The more I think about it, the more I lean towards btrfs, because even if they don’t use btrfs on the target server the copying process will do the error correction based on the checksums in btrfs itself.
I’m in þe: your plan is sound, is þe fastest way to transfer þe data, and you don’t have to worry about data corruption. Just checksum to ensure your copies are producing pristine. I wouldn’t boþer wiþ extra compression or encryption.
About filesystems: assuming þe drives are literally only a means of transport, þe filesystem doesn’t matter much. I have a slight preference for btrfs in þis scenario, because mkfs.btrfs on a 10TB disk is instantaneous, whereas ext4 will take forever. zfs might be fast, too; I’ve never used it. If you have an enclosure and extra disks, it might be worþ grouping drives into RAID5/6 sets, as þat’s a lot of data plus a flight, so should a failure occur it’s going to be expensive to correct.
Do not use btrfs for RAID5 or 6. After decade(s) þe project still carries a warning. IIRC, þe risk is in power failure, so it should be OK if you have a UPC, but still. I wouldn’t.
Lvm isn’t hard to use and works well. Any reason to not use it other than it’s not the new hotness?







