How to transfer a lot of storage?

poinck@lemmy.world · 2 days ago

How to transfer a lot of storage?

muusemuuse@sh.itjust.works · 10 hours ago

No raid. Instead ship 2 or 3 copies of data spread across different storage devices.

Honestly, is tape still a thing? Because this is exactly what it was good at.

sunbeam60@feddit.uk · 5 hours ago

Tape is still a thing: Ultrion tapes store up to 40 TB. But the devices to read and write them are not priced for mortals.

limelight79@lemmy.world · 11 hours ago

For some reason, if I were doing the physical media route, I’d want to ship the drives via FedEx or something similar. Presumably this isn’t the only copy of the data. Even if you still need to go, just dragging these drives around seems risky.

kyonshi@piefed.social · 19 hours ago

Two LTO-10 tapes (and presumably a LTO-10 reader to copy them over because I don’t think the destination would have that)

MonkderVierte@lemmy.zip · edit-2 1 day ago

Rsync with checksuming and respective mount options. What was it, 1 bit flip per 1 TB transfer?

poinck@lemmy.world · edit-2 18 hours ago

That sounds scary and like I need at least btrfs if I need to ship the data instead of using rsync.

solrize@lemmy.ml · edit-2 2 days ago

If you’re flying with drives full of data, better encrypt the data first. I’d just use the drives as a backup target for borg backup. Then at the other end, restore everything. You might need a spare, empty drive to get that process going. Alternatively, use your favorite encrypted file system if you want to keep the data encrypted after arrival, maybe a good idea too.

Better plan some logistics for one or more drives failing during this process too. I assume you have an intact copy of the data at home. So you can get a new drive written and shipped to you if something goes wrong.

Why do you have to do all this in person anyway though? Can’t you ship drives and have someone at the other end install them in a box for you? For that matter, is 80TB really too much data to transfer by network? With a mere 1 gbit connection it’s about a week of transfer.

poinck@lemmy.world · 19 hours ago

I wasn’t involved in the decision process to buy those drives and enclosures. Now they act as a backup, too.

solrize@lemmy.ml · 14 hours ago

I still don’t understand the bit about flying them somewhere. Where are they going? Bigger drives would mean fewer, too.

Bell@lemmy.world · 2 days ago

7 hard drives at 12TB each in your luggage?

poinck@lemmy.world · edit-2 18 hours ago

More like 8x 10 TB drives.

Spider89@piefed.social · 1 day ago

I’d use XFS as it’s excellent at copying big files of data (7z. img/iso/qcow2, 4K Videos).

For large amounts of smaller files (Like photos, odt, and PDFs), I’d use Ext4.

sunbeam60@feddit.uk · 5 hours ago

I second XFS for large files.

mko@discuss.tchncs.de · 2 days ago

Will the disks be permanently in-place there or are they just a means of transport? Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.

ZFS is up to the task if you can connect all the disks at the same time at the target location. You don’t really have to keep track of the order of the disks - ZFS will figure it out when mounting the pool. The act of copying the data from the disks will effectively perform a scrub at the same time.

If you will only attach one disk at a time, it is a bit more of a coin toss. Although - ZFS single disk volumes do support scrubbing as well.

Thinking about disk corruption in transit would be one of my worries - X-ray scans, vibration and just handling can do stuff with the bits. Tgz, zip or rar files with low or no compression can provide error detection, although low recovery. Checksum files can also help with detection. Any failed files can perhaps be transferred over the network for recovery.

poinck@lemmy.world · 19 hours ago

Thx.

The disks are only meant for transport at this time.

The more I think about it, the more I lean towards btrfs, because even if they don’t use btrfs on the target server the copying process will do the error correction based on the checksums in btrfs itself. I hope btrfs does it the same way as ZFS in this scenario.

mko@discuss.tchncs.de · 6 hours ago

It’s a good idea to use what you know. I don’t have much experience with btrfs but if it does what it says on the tin then it should be safe to use.

Copying the contents at the target is a good strategy. If the drives are to be put into 27/7 use later I would probably consider wiping them and run an integrity test before putting them to use, as once they start being used it will be too late (and stay as a doubt in the back of my mind).

atzanteol@sh.itjust.works · 2 days ago

Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.

What? Lol no. They’ll travel fine.

Daniel Quinn@lemmy.ca · edit-2 2 days ago

Multiple disks with many moving parts, containing 80TB of data on magnetic platters flying at high altitude where they’ll be subjected to far more physical impacts, radiation, and cosmic rays than at sea level.

Yeah, it’s a risk.

Here’s a really amazing Radiolab episode about bit flips!

atzanteol@sh.itjust.works · 1 day ago

You kids think HDDs just failed daily or something. I flew all over the place with a laptop with an HDD for years, as did many others. It’ll be fine. Especially since it’s unlikely they would be using the drives while traveling.

mko@discuss.tchncs.de · 1 day ago

From a position of handling corporate data on a daily basis, I am pretty confident that data integrity is top of mind.

poinck@lemmy.world · 18 hours ago

I agree with both of you. Somehow I don’t worry about the drive in my laptop but 80 TB of scientific data is another thing, and I want to make sure it is the same data when it arrives.

frongt@lemmy.zip · 2 days ago

Really, then why is there an explicit SMART conveyance test?

It’s to test for damage that may have occurred during shipping.

atzanteol@sh.itjust.works · 1 day ago

And how often does it happen?

mko@discuss.tchncs.de · 1 day ago

How do you ensure that is doesn’t happen? If this is corporate data that can be key.

poinck@lemmy.world · 18 hours ago

this is scientific data.

Funfact, I recently did a scrub on my offline backup drive of my work PC. It correct around 250 errors. I wouldn’t have noticed any problems if I had used ext4 instead of btrfs.

frongt@lemmy.zip · 23 hours ago

Often enough that there’s a test designed to detect it specifically. If you want hard data you’ll have to find it on your own, I don’t have any handy.

freeman@feddit.org · 2 days ago

I dont have the knowledge to help you. But I know enough to be intrigued by your usecase. Can you share what you are trying to do? Is it a corporate job? Or a personal collection or sth?

poinck@lemmy.world · 19 hours ago

It is scientific data that needs to be available on another server.

freeman@feddit.org · 16 hours ago

Aah, interesting, havent considered this. Thanks!

loweffortname@lemmy.blahaj.zone · edit-2 1 day ago

btrfs can pool disks just fine. Create a RAID nice and quick.

There’s also btrfs send and receive. Which may be what you need for shipping the data? You can use SSH for a secure write…

If this is a one-time copy, I’d strongly consider just syncing the data vs. shipping drives (which, as people have pointed out, may have serious reliabilty concerns).

Otherwise, if you must ship, I’d say the best move is two copies of each piece of data, so any single drive failing in shipping isn’t a big deal. But not a RAID. Just two literal copies on two separate drives. Simplest way to ensure some redundancy.

poinck@lemmy.world · 18 hours ago

Yes, using rsync between the two servers would be the best option. I guess, despite I already have the drives. On my end I could provide the access and arrange proper security with VPN, but at the target there are still too many question marks and I cannot currently count on some basic Linux knowledge there.

For a previous transfer of much less data I had to write a PS script that handled the transfer. It was very slow.

So, I am actually dealing with another problem: Can I get enough information from the non-tech persons to provide the best and easiest solution for them.

Thx so far all the ideas from all of you.

SayCyberOnceMore@feddit.uk · 2 days ago

Not quite clear there…

You’re copying data from the source, to harddrives… and then to a server with different drives?

Assuming it’s just lots of smallish data files / media and not OS files (ie don’t need symlinks, attributes, ownership, etc) then any backup software which generates hashes to be able to repair the archive during a restore would do.

Btrfs doesn’t need LVM, but I wouldn’t use that on mobile drives.

Or… is this one huge 80TB file?

poinck@lemmy.world · 19 hours ago

Your assumption is correct. These are many files of medium size: sat raster images.

The more I think about it, the more I lean towards btrfs, because even if they don’t use btrfs on the target server the copying process will do the error correction based on the checksums in btrfs itself.

Ŝan • 𐑖ƨɤ@piefed.zip · 24 hours ago

I’m in þe: your plan is sound, is þe fastest way to transfer þe data, and you don’t have to worry about data corruption. Just checksum to ensure your copies are producing pristine. I wouldn’t boþer wiþ extra compression or encryption.

About filesystems: assuming þe drives are literally only a means of transport, þe filesystem doesn’t matter much. I have a slight preference for btrfs in þis scenario, because mkfs.btrfs on a 10TB disk is instantaneous, whereas ext4 will take forever. zfs might be fast, too; I’ve never used it. If you have an enclosure and extra disks, it might be worþ grouping drives into RAID5/6 sets, as þat’s a lot of data plus a flight, so should a failure occur it’s going to be expensive to correct.

Do not use btrfs for RAID5 or 6. After decade(s) þe project still carries a warning. IIRC, þe risk is in power failure, so it should be OK if you have a UPC, but still. I wouldn’t.

atzanteol@sh.itjust.works · 2 days ago

Lvm isn’t hard to use and works well. Any reason to not use it other than it’s not the new hotness?