Discovery Genomics

1. How long will it take to get my data?

All samples are processed by application in the order they are received. All samples are batched with like samples from other customers so that protocols will be run on half a plate (48 samples) or more at once. Running samples in large batches allows us to keep costs low, though it can sometimes cause a week or two delay in processing. For certain protocols such as microarray, we don't have enough samples to establish a regular schedule, so there will be delays in processing. In addition, there is almost always a queue of prepared libraries ready to run on sequencers, which means it may take up to another two weeks to get on the instrument. Sequencing times vary depending on the run type (see #6). Analysis, such as alignment, can take up to 1 week, depending on complexity and current cluster load.

2. Why does everyone in my lab need a separate account?

There are a number of reasons it's preferable to maintain individual accounts in the Discovery system rather than common, lab-wide logins:

With multiple logins, each member of the project is copied on any emails, as opposed to all communication only going to a single person. This is especially advantageous in the event that we have an urgent question and the single point of contact is out of the office
Especially in large labs, having a single account makes it very difficult for the PI to restrict access in the event that lab members move on
If we do not have valid contact information for the PI, billing (and therefore data release) can be delayed

3. Can I run protocol X with less than the recommended amount of input material?

Most of the time, there is some room in the protocol to use less material. However, the amounts we request were chosen to allow us to process in a high-throughput fashion without the bias concerns that can arise from very small starting amounts. Please contact us before submitting samples at lower amounts, so that we can help you achieve the greatest value for your sample.

4. Can I run protocol X with material that is not of the ideal integrity?

Our integrity guidelines were set to ensure that the data we produce is of the highest quality, but there are times where it is impossible to obtain more/cleaner material. We will run whatever you ask us to, but we cannot be responsible for the final quality of samples which fail QC.

5. What types of analysis can be provided by Discovery?

The deliverable for all sequencing services should be assumed to be fastq files, which are the raw reads as they come off the sequencer, along with per-base quality scores. Analysis of data may be provided depending on the application and the customer's request. Alignment of WGS or RNA-seq data can be performed for a flat fee per sample. For additional analysis, such as comparison of RNA-seq data across groups, preparation of materials for slides or manuscripts, a collaboration will be set up between the customer and the Discovery scientist. Payment for analysis is still required, but appropriate recognition of the Discovery staff is expected. If samples are from a "common" organism (i.e. human, mouse, rat): Reads from DNA or RNA samples can be aligned to a reference genome. If applicable, sequence capture efficiencies and coverages will be calculated. Bam and vcf files can be provided for human whole genome samples run on the NovaSeq through our DRAGEN platform for a flat fee per sample. If desired, GATK analysis may also be performed for an additional fee, but does require additional time. Analysis fees are calculated on a per-project basis and cover staff time, software licenses, and computer/cluster time.

If samples are from an "unusual" organism: The same basic analyses are available, but would require an extra fee to set up reference data. This assumes, of course, that suitable data (finished genome, gene models, etc) exist in the public domain. For specific protocols, such as metagenomics (16S) projects, analysis is provided entirely through Illumina's basespace which will categorize reads down to the genus level.

6. How long does it take to run the sequencers?

These values are how long it takes to sequence the samples; they do not include library preparation or analysis time:

Sequencer	Time (hours)
NovaSeq SP PE-50bp	13
NovaSeq S2 PE-50bp	16
NovaSeq S1 PE-100bp	19
NovaSeq S2 PE-100bp	25
NovaSeq S4 PE-100bp	36
NovaSeq SP/S1 PE-150bp	25
NovaSeq S2 PE-150bp	36
NovaSeq S4 PE-150bp	44
NovaSeq SP PE-250bp	38

7. What is a BAM file?

A BAM file is the binary (compressed) variant of a Sequence Alignment/Map file, a standard format for storing large sets of alignment data. The BAM files that we deliver are already sorted. The standard genome used for human WGS alignment with the DRAGEN platform is hg19, but GRCh37 may be used at the customer's request if included within the Special Instructions at the time of submission.

8. How are FASTQ files named?

FASTQ files sequenced at Discovery are named with a series of identifiers to allow the precise assignment of a particular barcoded library to a lane of a specific flowcell. As such, no two fastq files will ever be given the same name. For example, fastq file name HNJWNCCXX_s8_1_GSLv3_05_Sl146310.fastq.gz contains five pieces of information: the flowcell name (HNJWNCCXX), the lane (s8), the read number (1 = forward read, 2 = reverse read), the barcode set used and the barcode number (GSLv3 barcode set, barcode 05) and the unique Sequencing Library ID (SL146310).

9. How can I visualize my alignments?

Try the Integrative Genomics Viewer from the Broad Institute.

10. What tools do you recommend for performing my own analysis?

For DNA sequencing, open source tools are BWA for alignment,and the Genome Analysis Toolkit (GATK) for most other analysis, including variant finding.

For RNA-Seq, we use TopHat for spliced alignments and Cufflinks for isoform assembly and quantification.

Frequently Asked Questions

General Questions

Sample Handling

High-throughput Sequencing