How is PhIP-Seq data analyzed?
Following sequencing, paired-end reads undergo demultiplexing, trimming, and alignment to the peptide library with a minimum alignment score of 50. Read counts for each phage clone are recorded, and differential representation analysis is performed using the edgeR exactTest function, generating fold changes and p-values for each sample relative to mock immunoprecipitations. Enriched clones are defined as those exhibiting a log2 fold change of at least 2.0 and an unadjusted p-value below 10-4 compared to mock-immunoprecipitation controls. Binary hit calls and quantitative log2 fold change values are generated for downstream analysis. False discovery rate assessment is conducted by comparing each mock-immunoprecipitation sample against the remaining decoy pool.
What’s included in a deliverable packet?
1. projectname_counts.csv - A total counts matrix of each sample presents the data in its most raw format.
2. projectname_edgeR_log2foldchg.csv - The edgeR computed log2 fold change of each sample versus the background distribution.
3. projectname_edgeR_log10pval.csv - EdgeR log10 p-values for each counts reading versus the background distribution.
4. projectname_edgeR_enriched.csv - Simple hits call file. These are true/false (1/0) hit calls.
5. projectname_edgeR_log2foldchg_enriched.csv - Main quantitative hits call file used to compute many delivered plots and analyses.
Unless you are performing your own normalizations and hit calling, the edgeR_log2foldchg_enriched.csv is the best file to start with for downstream analysis.
What is included in a PhIP-Seq report?
Each comprehensive report offers detailed methodology, raw and processed data for custom analysis, and extensive quality control for data reliability. View an overview of a PhIP-Seq Study Report here.
General Content of each PhIP-Seq report:
1. Overview of PhIP-Seq: Includes a brief description of the platform with sections detailing library cloning, sample screening, and informatics methods.
2. Core Pipeline Output files: These are produced from FASTQ files according to the pipeline described. There are five files included: counts (raw data), log2foldchange, log10pval, edgeR enriched (simple hits call file) and log2foldchange enriched (most quantitative).
3. QC Figures: Provide information on the phages produced and present in each assay, the mapped counts per sample and correlation matrices for controls and the samples in each study.
4. Figures summarizing the results from edgeR Enriched Hits: Contains figures describing the number of peptide hit enrichments in each sample by groups and with a correlation matrix.
5. Processed Hits Pipeline Output Files: Offers insight on peptide-level group hit counts and protein-level group analysis (polyclonal hit counts) as well as providing a list of the most variable publicly reactive peptides across the sample cohort.
6. Most Variable Peptides: Provides an unbiased hierarchical clustering of the top variant peptides.
7. Fisher Exact Tests (only for unblinded studies): Contain case versus cohort statistical outputs according to the provided study group.
8. Study Heatmaps: Heatmaps are created for all proteins containing a peptide represented in the top 100 variant peptides.
What types of reports are offered by CDI Labs Canada for PhIP-Seq?
We offer two types of reports for blinded and unblinded studies. Blind studies receive all the data described in the deliverables and a general report with unbiased figures. Unblinded studies can receive case-cohort comparisons for up to 12 individual study groups.
Does CDI Labs use any normalization for the data?
The data is Log2 transformed before using it for calculation Z scores.
Log2 is used because it aids in calculating fold change, which measures the up-regulated vs down-regulated signals between samples. Log2 measured data is also closer to the biologically-detectable changes. Log2 transformation is the most commonly used transformation for microarray data. This transformation stabilizes the data variance of high intensities but increases the variance at low intensities.
Is the data provided quantitative?
The data is semi quantitative. Although the data is not a titre per say of IgG antibodies, the number of read counts of a particular phage/hit can be used to look at fold changes and statistical significance.
What bioinformatics is included in PhIP-Seq services projects?
See below:
1. As part of the price for the service, the customer can choose up to 12 different statistical comparisons (Fisher Exact Tests). Results are output at both the peptide level and as simple protein-level and ‘polyclonal’ tests. One output file is provided for each unique studyGroup.
2. Peptide-level Fisher exact tests are performed on a 2x2 matrix of Hits and No_Hits columns for each provided group using the fisher.test function in R. Resulting p-value calculations are exported and saved in the fisher_exact_tests folder found in the deliverable stack of data provided to the customer.
3. projectname_top_public_peptides.csv – The most frequent publicly reactive peptides across the sample cohort; this file is output even for blinded studies.
4. Whole-protein epitope-scanning heatmap.
5. projectname_group_hit_counts.csv - Peptide-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to each peptide in the library. One Hit and No_Hit column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If more than two groups are provided, these are used for Fisher exact tests.
6. projectname_group_protein_hit_counts.csv - Protein-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to at least one peptide from a given protein in the library. One Protein_Hits and Protein_No_Hits column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If groups are provided, these are used for Fisher exact tests. Since this is a protein-level analysis, peptide-level metadata columns are not included.
7. projectname_group_polyclonal_hit_counts.csv - Protein-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to three or more peptides from a given protein in the library. One Polyclonal_Hits and Polyclonal_No_Hits column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If groups are provided, these are used for Fisher exact tests. Since this is a protein-level analysis, peptide-level metadata columns are not included.
8. Individual Sample Barplots - These barplots are created from the edgeR_enriched_log2foldchg.csv file and represent the top x25 enriched fold change signals versus ‘beadsonly’ decoy controls for each individual sample. If fewer than x25 significant hits are available, fewer peptides are plotted. An example of one of these images is shown below.
NOTE: In addition CDI Labs has bioinformatics experts on staff if more statistical analysis is required. This is not included in the price of the service and would be quoted on an individual basis once the scope of the work needed is understood.