Our integrity guidelines were set to ensure that the data we produce is of the highest quality, but there
are times where it is impossible to obtain more/cleaner material. We will run whatever you ask us to, but
we cannot be responsible for the final quality
of samples which fail QC.
5. What types of analysis can be provided by Discovery?
The deliverable for all sequencing services should be assumed to be fastq files, which are the raw reads as
they come off the sequencer, along with per-base quality scores. Analysis of data may be provided depending
on the application and the customer's request. Alignment of WGS or RNA-seq data can be performed for a flat
fee per sample. For additional analysis, such as comparison of RNA-seq data across groups, preparation of
materials for slides or manuscripts, a collaboration will be set up between the customer and the Discovery
scientist. Payment for analysis is still required, but appropriate recognition of the Discovery
staff is expected. If samples are from a "common" organism (i.e. human, mouse, rat): Reads from DNA or
RNA samples can be aligned to a reference genome. If applicable, sequence capture efficiencies and coverages
will be calculated. Bam and vcf files can be provided for human whole genome samples run on the NovaSeq through
our DRAGEN platform for a flat fee per sample. If desired, GATK analysis may also be performed for an additional
fee, but does require additional time. Analysis fees are calculated on a per-project basis and cover staff time,
software licenses, and computer/cluster time.
If samples are from an "unusual" organism: The same basic analyses are available, but would
require an extra fee to set up reference data. This assumes, of course, that suitable data
(finished genome, gene models, etc) exist in the public domain. For specific protocols, such as metagenomics (16S) projects, analysis is provided entirely through Illumina's basespace which will categorize reads down to the genus level.
6. How long does it take to run the sequencers?
These values are how long it takes to sequence the samples; they
do not include library preparation
or analysis time:
Sequencer |
Time (hours) |
NovaSeq SP PE-50bp |
13 |
NovaSeq S2 PE-50bp |
16 |
NovaSeq S1 PE-100bp |
19 |
NovaSeq S2 PE-100bp |
25 |
NovaSeq S4 PE-100bp |
36 |
NovaSeq SP/S1 PE-150bp |
25 |
NovaSeq S2 PE-150bp |
36 |
NovaSeq S4 PE-150bp |
44 |
NovaSeq SP PE-250bp |
38 |
7. What is a BAM file?
A BAM file is the binary (compressed) variant of a Sequence Alignment/Map file, a standard format for
storing large sets of alignment data. The BAM files that we deliver are already sorted. The standard genome used for human WGS alignment with the DRAGEN platform is hg19, but GRCh37 may be used at the customer's request if included within the Special Instructions at the time of submission.
8. How are FASTQ files named?
FASTQ files sequenced at Discovery are named with a series of
identifiers to allow the precise assignment of a particular barcoded
library to a lane of a specific flowcell. As such, no two fastq files will
ever be given the same name. For example, fastq file name
HNJWNCCXX_s8_1_GSLv3_05_Sl146310.fastq.gz contains five pieces of
information: the flowcell name (HNJWNCCXX), the lane (s8), the read number
(1 = forward read, 2 = reverse read), the barcode set used and the barcode
number (GSLv3 barcode set, barcode 05) and the unique Sequencing Library ID
(SL146310).
9. How can I visualize my alignments?
10. What tools do you recommend for performing my own analysis?
For DNA sequencing, open source tools are
BWA for alignment,and the
Genome Analysis Toolkit
(GATK) for most other analysis, including variant finding.
For RNA-Seq, we use
TopHat
for spliced alignments and
Cufflinks
for isoform assembly and quantification.