Main Input Data File¶

FAM file¶

This is the FAM file in PLINK.

Each lines of the FAM file describes an individual. It contains a white-space (space or tab) delimited records for each individual including fields for identifier, sex, two parents and an optional trait field.

Family ID

Individual ID

Father ID

Mother ID

Sex (1=male; 2=female; other=unknown)

Phenotype (optional)

Note

Unlike in PLINK, the last column with dummy data can be omitted in the FAM file when the main phenotype is in the phenotype file.

Note

It is the required input file and can be used alone for certain types of analyses in ONETOOL.

$ onetool --fam test_miss00.fam

PHENO file¶

This is the alternate phenotype file in PLINK to specify an alternate phenotype for analysis, i.e. other than the one in the PED (or FAM) file.

Each lines of the PHENO file contains the phenotype data for an individual. The first two columns contain the identifiers and the rest of columns are phenotype data.

Family ID

Individual ID

Phenotype1

Phenotype2

Phenotype3

Phenotype4

...

The detailed description of the PHENO file can be found in PLINK.

$ onetool --fam test_miss00.fam --vcf test_miss00.vcf --pheno test_miss00.pheno --pname sbp

VCF file¶

This is the Variant Call Format(VCF) file used in 1000 Genomes Project. Files in both plain text format (.vcf) or gzipped format (.bcf) are supported. The meta information lines (starting with ##) are ignored.

The first 9 columns in header and data lines are:

CHROM

POS

ID

REF

ALT

QUAL

FILTER

INFO

FORMAT

Note

The sample IDs in the header line (starting with #CHROM) have to match with the individual IDs in FAM file uniquely.

$ onetool --fam test_miss00.fam --vcf test_miss00.vcf

BED/BIM file¶

BED file is the PLINK binary PED file and it is used together with BIM file (extended MAP file: two extra cols = allele names).

The detailed description of the BED/BIM files can be found in PLINK.

$ onetool --fam test_miss00.fam --bed test_miss00.bed --bim test_miss0.bim

IMPUTE2 file¶

IMPUTE2 file is the output files from IMPUTE2 program and both .impute2 and .impute2_info files are required.

The genotype file stores (.impute2) data on a one-line-per-SNP format. The first 5 entries of each line should be:

SNP ID

RS ID of the SNP

base-pair position of the SNP

the allele coded A

the allele coded B.

The next three numbers on the line should be the probabilities of the three genotypes AA, AB and BB at the SNP for the first individual in the sample and the next three numbers for the second individual and so on.

The SNP-wise information file (.impute2_info) contains the following columns (header shown in parentheses):

SNP identifier from -g file (snp_id)

rsID (rs_id)

base pair position (position)

expected frequency of allele coded ‘1’ in the -o file (exp_freq_a1)

measure of the observed statistical information associated with the allele frequency estimate (info)

average certainty of best-guess genotypes (certainty)

internal “type” assigned to SNP (type)

The detailed description of the IMPUTE2 files can be found in IMPUTE2.

$ onetool --fam test_miss00.fam --dosage test_miss00.impute2 --mqls

PED/MAP file¶

PED file is the default a white-space (space or tab) delimited text file format used in PLINK. The same first six columns as in FAM file are mandatory.

When a PED file is used, it has to be used with PLINK MAP file which contains the following 4 columns.

chromosome (1-22, X, Y or 0 if unplaced)

rs# or snp identifier

Genetic distance (morgans)

Base-pair position (bp units)

The detailed description of the PED/MAP files can be found in PLINK.

$ onetool --ped test_miss00.ped --map test_miss00.map