WRITE¶
The WRITE command can be used to write a stream into one or more files simultaneously.
By specifying a “fork” column, with the -f
option, the stream can be forked, i.e. written into multiple files. In this case, #{fork} has to appear in the filename template. If you specify the -f
option, you must also specify the column that is used to fork the output files.
The -d
option can be used to write the fork files to directories instead of spliting filenames with #{fork}.
The -r
option can be used to eliminate the fork column from the output, since it is already represented in the filenames.
The -c
tells the command to use column store compression for the output.
Usage¶
gor ... | write filename [-f forkCol] [-d] [-r] [-c] [-m] [-i type]
Options¶
|
The “fork column” used to split the output into multiple files. |
|
Use subdirectories instead of #{fork} in filename for forkwrite. |
|
Eliminate the fork column from the output. |
|
Use column store compression for the output. |
|
Create MD5 sum file along with the output file. |
|
Write maxseg to the gor meta file. |
|
Write schema to the gor meta file. |
|
Write index file (.gori) with a .gorz file, (.tbi) with .vcf.gz Must state the type, which can be FULL, CHROM or TABIX |
|
Compression level (0-9). Default 1. |
|
List of tags which write ensures a file will be created. Only valid with the -f option. |
|
List of tags/alias to use in the resulting dictionary when
writing the files to directories. Usually used with partgor
as |
|
Takes in a text source containing prefix to be prepended to the file written. Also support string in single quotes |
|
Don’t write a header lines. Not valid with gor/gorz/nor/norz. |
|
Calculate cardinality of columns in ‘cols’ and adds to the outputs meta data. |
Examples¶
gor -p chr1:10020-10051 fileA.gor | write fileB.gorz
The query above will read the first four rows of the example file shown above and write them to a compressed GOR file in the same directory.
gor multiPIDfile.gor | write data_#{fork}.gor -f PID -r
The query above will write the contents of multiPIDfile.gor into as many files as there are distinct PIDs in the file. In this example, the output files from the WRITE command will be named data_101.gor and data_102.gor and the -r
flag removes the PID column (i.e. the column used to fork the data).
gor multiPIDfile.gor | write data_#{fork}.gor -f PID -r -t 'PN001,PN002,PN003'
The query above will write the contents of multiPIDfile.gor into as many files as there are distinct PIDs in the file. In this example a list of PIDs is supplied and the write command will create an empty file for each of the tags listed.
gor fileA.gor | write s3data://project/user_data/fileB.gorz.
The query above will write the contents of fileA.gor
into S3 project folder. In addition it will create a link to the S3 file in the project folder under user_data/fileB.gorz.link
.
There are 4 different S3 project and shared folders that can be written to:
Project |
s3data://project/<path to data> |
Current project S3 folder. |
Shared |
s3data://shared/<path to data> |
S3 folder shared between all projects. |
Region |
s3region://shared/<path to data> |
S3 region shared folder. |
Global |
s3global://shared/<path to data> |
S3 global shared folder. |