Original Article by Juan Valencia
Although some file archivers offer us the option of split the files, this can be easily accomplished with two commands:
Splitting a file with split
split just needs the size of the parts that we want to create, and the file that we want to split, e.g.:
split -b 1024 file_to_split.bin
If this file is 6 kibibytes long, it will create 6 files of 1 kibibyte each, named
-b is what defines the size of the resulting parts. You can use suffixes, either the SI suffixes KB (Kilobyte: 1000 bytes), MB (Megabyte: 1000×1000 bytes), GB (Gigabyte: 1000×1000×1000 bytes), TB, PB, EB, ZB, YB. Or you can use the EIC suffixes K (Kibibyte: 1024 bytes), M (Mibibyte: 1024×1024 bytes), G (Gibibyte: 1024×1024×1024 bytes), T, P, E, Z, Y. E.g.:
split -b 1K file_to_split.bin split -b 10M file_to_split.bin split -b 1KB file_to_split.bin split -b 10MB file_to_split.bin
The past examples would create parts of 1,024 bytes, 10,485,760 bytes, 1,000 bytes and 10,000,000 bytes respectively.
If we don’t want the default prefix
x, we can change it by adding the new prefix after the name of the file that we want to split, e.g.:
split -b 1024 file_to_split.bin a_part_
If we are splitting a file of 6 kibibytes as in the first example, this would generate the files:
We can change the length of the suffix in the resulting files, and we can choose between an alphabet based suffix (the default) or a numeric suffix. How many parts we can create depends of this two features of the suffix. If we keep the default length of 2, and don’t use a numeric suffix, we can split a file in up to 676 parts. If split runs out of suffixes, it will fail, leaving us with the files created until the moment it failed.
To change the length of the suffix use the parameter
-a followed by a number. E.g.:
split -b 1024 -a 4 file_to_split.bin
Following the first given example again, we would end with the files:
To use a numeric suffix use the parameter
-d, of course with a numeric suffix and a length of 2 we can split a file in up to 100 parts. E.g.:
split -b 1024 -d file_to_split.bin
And using the first example one last time, we would end with the files:
Merging the parts that were created with split
Since the files created with the
split command are sequential, we can simply use
cat to merge this files into a new file, e.g.:
cat x?? > reconstructed_file.bin
The question mark acts as a wild-card character. How many question marks do we use depends of course of the length of the suffixes used when creating the parts.
Split a file per lines
split also allow us to split a text file per lines rather than per the size of the resulting parts. I am sure this was very useful at some point in the past, but I can’t think of a reason to split a text file per lines other than for experimental or didactic purposes, text processors are quite capable of dealing with very large text files (great, just days after writing this I found a good application, splitting long lists of URLs and splitting those long MySQL files full of commands so we can upload them to those web-based systems that can’t handle big files, I left this comment for humorous purposes). Nevertheless, the parameter to split a text file per lines is
-l followed by the number of lines. E.g.:
split -l 20 file_to_split.txt
This will create one file for every chunk of 20 lines in the original file, so if the file had 54 lines, it would create the files
xaa (lines 1-20),
xab (lines 21-40) and
xac (lines 41-54).
There is another option in
split, a mix between splitting the file per size in bytes and per number of lines. In this mode we specify a maximum size in bytes for the parts, and
split will fit as many complete lines as possible in each part without exceeding the specified size in bytes. For this we use the parameter
-C followed by a number representing the size in bytes, you can use any of the suffixes that are valid for the option
-b. E.g.: Assume that we have a file that contains 20 lines of 100 characters each, totalling 2000 bytes, and we use the following command:
split -C 512 file_to_split.txt
This would give us the files:
xaa (lines 1-5, since the sixth line, having 100 characters, would not fit in the 512 bytes that we set as limit),
xab (lines 6-10),
xac (lines 11-15) and
xad (lines 16-20). Each file would have a size of 500 bytes.
- Linux Terminal: speedtest_cli checks your real bandwidth speed.
- Zorin OS 9 Core Review: As good as Linux Mint 17!
- How to reboot Linux automatically on Kernel Panic
- Linux AIO some of the most common distributions in one ISO
- How to share on linux the output of your shell commands
Find me on Google+