Used in: gor/nor

WRITE

The WRITE command can be used to write a stream into one or more files simultaneously.

By specifying a “fork” column, with the -f option, the stream can be forked, i.e. written into multiple files. In this case, #{fork} has to appear in the filename template. If you specify the -f option, you must also specify the column that is used to fork the output files.

The -d option can be used to write the fork files to directories instead of spliting filenames with #{fork}.

The -r option can be used to eliminate the fork column from the output, since it is already represented in the filenames.

The -c tells the command to use column store compression for the output.

Usage

gor ... | write filename [-f forkCol] [-d] [-r] [-c] [-m] [-i type]

Options

-f column

The “fork column” used to split the output into multiple files.

-d

Use subdirectories instead of #{fork} in filename for forkwrite.

-r

Eliminate the fork column from the output.

-c

Use column store compression for the output.

-m

Create MD5 sum file along with the output file.

-maxseg

Write maxseg to the gor meta file.

-inferschema

Write schema to the gor meta file.

-i type

Write index file (.gori) with a .gorz file, (.tbi) with .vcf.gz Must state the type, which can be FULL, CHROM or TABIX

-l level

Compression level (0-9). Default 1.

-t 'tags'

List of tags which write ensures a file will be created. Only valid with the -f option.

-tags 'tags'

List of tags/alias to use in the resulting dictionary when writing the files to directories. Usually used with partgor as -tags #{tags}.

-prefix hf

Takes in a text source containing prefix to be prepended to the file written. Also support string in single quotes

-noheader

Don’t write a header lines. Not valid with gor/gorz/nor/norz.

-card 'cols'

Calculate cardinality of columns in ‘cols’ and adds to the outputs meta data.

Examples

gor -p chr1:10020-10051 fileA.gor | write fileB.gorz

The query above will read the first four rows of the example file shown above and write them to a compressed GOR file in the same directory.

gor multiPIDfile.gor | write data_#{fork}.gor -f PID -r

The query above will write the contents of multiPIDfile.gor into as many files as there are distinct PIDs in the file. In this example, the output files from the WRITE command will be named data_101.gor and data_102.gor and the -r flag removes the PID column (i.e. the column used to fork the data).

gor multiPIDfile.gor | write data_#{fork}.gor -f PID -r -t 'PN001,PN002,PN003'

The query above will write the contents of multiPIDfile.gor into as many files as there are distinct PIDs in the file. In this example a list of PIDs is supplied and the write command will create an empty file for each of the tags listed.

gor fileA.gor | write s3data://project/user_data/fileB.gorz.

The query above will write the contents of fileA.gor into S3 project folder. In addition it will create a link to the S3 file in the project folder under user_data/fileB.gorz.link.

There are 4 different S3 project and shared folders that can be written to:

Project

s3data://project/<path to data>

Current project S3 folder.

Shared

s3data://shared/<path to data>

S3 folder shared between all projects.

Region

s3region://shared/<path to data>

S3 region shared folder.

Global

s3global://shared/<path to data>

S3 global shared folder.