Copying files to flash memory: bit-by-bit duplication
What does copying a file mean? Not in the sense of ‘what’s the dictionary definition of copy’, but in the sense of ‘when a computer makes a copy of a file, what is it actually doing, at a data level’? To that we can add a supplementary question – ‘Why should I care?’ The answer to the second question matters a lot to anyone involved in copying files to flash memory, whether that’s for long term storage or regular use. And if you’re interested in the second question, then logic suggests you’re also interested in the first.
When is a copy not a copy?
Let’s start with the fact that files are made up of binary data i.e. just 1s and 0s. A computer neither knows nor cares what’s in the file. A file is a file, and its constituent parts are 1s and 0s.
When you copy a file manually, from, say, your computer hard drive to a USB flash drive(which is, of course, flash storage), you will typically select it in a file browser, chose to copy it, open the location you want it copied to, and chose to make the copy. You’ll see a file with the same name as your original appear in its new location. You’ve made a file copy. Except that it’s not really a copy.
When a file gets copied in this way some information always gets changed. The computer doesn’t change your equivalent of “Last night I dreamt I went to Manderley again” into something like “Yesterday afternoon I popped to the post office again.” It’s not that clever. But it does change information that is part of the file, like the last accessed date stamp or the modified date.
What’s the problem?
When you go back and read your file, you’ll see that all the same words from the original are in your copy, in the same order. But the file is essentially different, and even the very smallest of changes, like the last accessed date stamp, mean you can’t accurately compare the original and the copy. The two files are not the same. They are different.
This matters a lot. Let’s move away from your magnum opus and consider your movie-making debut. You’ve made a video to enter into a local competition. You want to copy a 10 minute long 1080P (HD video) shot at 30 frames per second. The file is 1.5GB in size. That means it contains 12,884,901,888 bits (each bit being either a 1 or a 0).
One of those bits doesn’t copy correctly. A 1 becomes a 0, or maybe a 0 becomes a 1. Either way, it’s wrong, and there’s been some data corruption. In the grand scheme of things, it’s a tiny error, maybe one pixel on one frame of the video. If you – or the competition judges – watched it back, you’d never notice. But what if it’s not a pixel in a frame that gets changed. What if it’s a bit in the file header – where the video’s settings and structure are stored? If the corruption is there, maybe the colours will morph out of recognition, or maybe the video won’t run at all.
How is a copied file is stored?
Let’s look at this file copy thing from a different perspective and consider how a computer copies files between devices. It would be logical to assume that if you copied the same file from a PC to two different flash devices that the file would be put in the same place on each device. But this is not the case.
A storage device has a file system, which is a bit like the contents, index and page numbers of a book. When new data gets added to the flash storage, the file system is updated so that it can be found again. The file system identifies gaps in the flash memory’s storage, and the new data gets put there. If you make six copies, onto six flash devices, your file may well go in six different places.
To overcome the problem of changed bits and ‘random’ file storage, you have to make your copy on the basis of the binary data, the 1s and 0s that make up the whole file – both the parts we can see when we look at it (like the words of a document and the frames of a video) and the parts we can’t see (like the instructions on what font to display and the frame rate of a video). A binary copy faithfully reproduces every single one of those 1s and 0s, regardless of what they signify.
A binary copy will be 100% identical to the source. This means it is possible to check the copy against the source, and verify that it is a 100% successful copy. That, in turn, means bit-by-bit copies have the advantage that it is possible to create a data checksum – a number (usually hexadecimal) that computers can use to quickly validate that a copy is an exact copy.
Even the smallest change in a copy will be detected by the checksum as the number can significantly change even if only one bit of data has corrupted.
And finally, a conundrum solved
Bit-by-bit copying can sometimes leave users a little confused, because when they try it, and it seems it should work, it doesn’t. But there will be a reason. Here’s an example.
You’ve got a 32GB microSD card containing five videos, each of 2GB in size. So that’s 10GB of data. You want to make an identical copy to a 16GB microSD card using a binary copy technique. There is more than enough space on the microSD card to hold the data.
But the copy doesn’t work. Why?
The computer does not care what files are on the 32GB microSD card. It looks at the whole card and wants to copy every single bit of the 32GB regardless of what it stores – including if it’s storing no data at all. The 16GB card isn’t big enough. You can’t squeeze a pint into a half-pint glass using this method.
And, because you’re probably already asking the question, here’s what happens if you try to binary copy a 16GB microSD card onto a 32GB card. It will work, but the filesystem structure copies along with everything else, so your computer will now think the 32GB card is a 16GB card.
In the end, then, copying files onto flash storage, whether for backup or for everyday use, is a bit more complex than it might first appear. Anyone who wants to ensure their data is 100% replicated onto the new flash storage should make a bit-by-bit copy, and not just rely on the copy and paste function of their computer.
If you would like support with your flash memory duplication get in touch with one of our experts today.
Author: Dan Hockham
Date: Friday 28th August 2020