Frequently Asked Questions

For VirScan®, HuScan® and MouseScan™ PhIP-Seq platforms

Expand All Answers
HuProt
PhIP-Seq

Phage ImmunoPrecipitation Sequencing (PhIP-Seq)

What is PhIP-Seq?

Bacteriophage Display Immunoprecipitation-Sequencing (PhIP-Seq) is a powerful method of multiplexed seromics analysis that combines synthetic antigen libraries with an antibody-capturing pulldown and a high-throughput DNA sequencing readout. The first human autoantigen PhIP-Seq library was developed in 2011 with a viral library first reported in 2014.^1,2 There are many library and methods variations of PhIP-Seq. The data you will receive are generated using a well-established human proteome library according to methods from reported protocols.^3,4

Larman, H. B. et al. Autoantigen discovery with a synthetic human peptidome. Nat. Biotechnol. 29, 535–541 (2011)
Xu, G. J. et al. Comprehensive serological profiling of human populations using a synthetic human virome. Science 348, aaa0698 (2015)
O’Donovan, B. et al. High-resolution epitope mapping of anti-Hu and anti-Yo autoimmunity by programmable phage display. Brain Commun. 2, fcaa059 (2020)
Mohan, D. et al.. PhIP-Seq characterization of serum antibodies using oligonucleotide-encoded peptidomes. Nat. Protoc. 13, 1958–1978 (2018).

Content

What libraries do you currently offer?

CDI Labs Canada provides a variety of options to probe the immune system with its current collection of phage display libraries: HuScan®, VirScan® and MouseScan™.

What’s the content of each library?

The content of each library is described below:

HuScan® - contains the full human proteome, including all published and computationally predicted spliced variants and coding regions from an NCBI RefSeq database obtained in 2015.
VirScan® - offers a unique library designed to target over 68,000 annotated viral protein sequences from most known vertebrate, mosquito-borne and tick-borne viral genomes according to UniProt and RefSeq databases in 2017.
MouseScan™ - is the first whole proteome murine autoantibody discovery platform reported and the first available commercially, containing each protein and isoforms in the GRCm38.p5 mouse proteome.

Overview of PhIP-Seq Libraries

Product	Unique Protein Content	Unique Peptide Tiles	Peptide Length	Overlap
HuScan® ¹	48,000+	700,000+	49 AA	25 AA
VirScan® ²	68,000+	480,000+	62 AA	14 AA
MouseScan™ ³	50,000+	482,000+	62 AA	19 AA

Publications:

O’Donovan, B. et al. High-resolution epitope mapping of anti-Hu and anti-Yo autoimmunity by programmable phage display. Brain Commun. 2, fcaa059 (2020).
Schubert, R. et al. Pan-viral serology implicates enteroviruses in acute flaccid myelitis. Nat. Med. 25(11): 1748–1752. (2019).
Rackaityte, E. et al. Validation of a murine proteome-wide phage display library for the identification of autoantibody specificities. bioRxiv. PrePrint.(2023).

Applications

What types of studies can I conduct using PhIP-Seq?

PhIP-Seq libraries have been used in a wide range of applications, which include:

Autoantibody Profiling

- HuScan®
- MouseScan™

Viral Exposure Profiling/Pathogen Surveillance

- VirScan®

Antibody Cross-reactivity Testing

Molecular Mimicry

- VirScan®

What types of samples can be tested?

The PhIP-Seq platform currently detects any IgG containing sample. Below is a table detailing the sample types and amount.

Type	Volume per sample
Serum or plasma	20 µL
Cerebrospinal fluid (CSF)	250 µL
Other antibodies (IgG monoclonals, B cell supernatants)	250 µL at 0.01 mg/mL

What isotype is detected by the assay?

Different species of IgG and their subclasses can be detected, with strong affinity for all human and mouse IgG.

What species can be detected?

Human, mouse, rat and rabbit antibodies have been detected successfully and more commonly studied. However, the platform allows for detection of other species such as dog, bovine, goat, horse and pig IgG.

Assay Methodology

How is a PhIP-Seq assay performed?

To determine the antibody hits, IgG-containing samples are mixed with an excess of the phage library. Each phage-sample mix is allowed to bind overnight, followed by a pull down and wash step to remove unbound targets. Bound targets are then amplified and indexed by a series of PCR steps in preparation for sequencing. Samples are sequenced using Illumina instruments to obtain paired-end reads of the amplified phage inserts.

PhIP-Seq Assay Workflow

Associated references:

Mohan, D. et al. PhIP-Seq characterization of serum antibodies using oligonucleotide-encoded peptidomes. Nat. Protoc. 13, 1958–1978 (2018).

What controls are included as part of the service/project?

Please see this list of controls:

1. A set of 36 beads-only controls. This is looking at the background binding of the Protein A/G beads alone. These values are then subtracted from the final data. In addition, a GFAP polyclonal antibody (with known binding characteristics) is spiked in. This control ensures that the assay is performing correctly (see below).

GFAP – Control – GFAP HUMAN Glial Fibrillary Acidic Protein

2. A correlation matrix of baseline counts distribution for baseline phage abundance and controls. This demonstrates that the counts distribution for most background phages (‘beadsonly’) are highly correlated - this makes them a reliable decoy control from which to call real hits in the samples.

3. Cumulative sum of the ‘beadsonly’ background phage distribution histogram. This demonstrates that most phages are present in the assay within one log step distance of the median frequency for this experiment. Typically, the median background frequency is around 100 copies, with most phages uniquely observed between ten and one thousand times in the mock-immunoprecipitation background (i.e. between 1 and 3 on a log10 scale).

Cumulative Sum of Baseline Phage Abundance

4. Frequency histogram of the ‘beadsonly’ background distribution. This histogram plots the number of unique phage clones present at different binned read count frequencies across the pool of ‘beadonly’ samples; demonstrates that most unique phages were successfully produced and are present in the mock-immunoprecipitation background for this experiment.

5. Number of mapped counts for each sample. This is data showing the total mapped counts for each sample to the expected PhIP-Seq library; >95% of all samples – including ‘beadsonly’ and other provided assay controls – demonstrate read mapping to >1x coverage of the parent library and passes mapping quality control. You can recreate this plot from the counts.csv file by adding up the sum of each sample column.

6. Grouped number of mapped counts for each sample. Grouped results of the mapped counts according to provided study and control metadata.

How much is the sequencing depth for each library?

Each library is sequenced to provide 4X coverage of each of the phage clones included.

How many counts will my data have?

Our passing criteria assures reads mapping to >1x coverage of the parent library for >95% of all samples – including ‘beadsonly’ and other provided assay controls.

How many hits will I get?

The number of antibody hits will depend on the sample and study.

How many samples can be tested at the same time?

As many as 336 unique samples can be pooled into a single flow cell. However, PhIP-Seq is a high throughput platform that offers flexible scaling. Projects are quoted in multiples of x12 samples up to 48 and any number after 48 samples.

Data Analysis and Deliverables

How is PhIP-Seq data analyzed?

Following sequencing, paired-end reads undergo demultiplexing, trimming, and alignment to the peptide library with a minimum alignment score of 50. Read counts for each phage clone are recorded, and differential representation analysis is performed using the edgeR exactTest function, generating fold changes and p-values for each sample relative to mock immunoprecipitations. Enriched clones are defined as those exhibiting a log₂ fold change of at least 2.0 and an unadjusted p-value below 10^-4 compared to mock-immunoprecipitation controls. Binary hit calls and quantitative log2 fold change values are generated for downstream analysis. False discovery rate assessment is conducted by comparing each mock-immunoprecipitation sample against the remaining decoy pool.

What’s included in a deliverable packet?

1. projectname_counts.csv - A total counts matrix of each sample presents the data in its most raw format.

2. projectname_edgeR_log2foldchg.csv - The edgeR computed log2 fold change of each sample versus the background distribution.

3. projectname_edgeR_log10pval.csv - EdgeR log10 p-values for each counts reading versus the background distribution.

4. projectname_edgeR_enriched.csv - Simple hits call file. These are true/false (1/0) hit calls.

5. projectname_edgeR_log2foldchg_enriched.csv - Main quantitative hits call file used to compute many delivered plots and analyses.

Unless you are performing your own normalizations and hit calling, the edgeR_log2foldchg_enriched.csv is the best file to start with for downstream analysis.

What is included in a PhIP-Seq report?

Each comprehensive report offers detailed methodology, raw and processed data for custom analysis, and extensive quality control for data reliability. View an overview of a PhIP-Seq Study Report here.

General Content of each PhIP-Seq report:

1. Overview of PhIP-Seq: Includes a brief description of the platform with sections detailing library cloning, sample screening, and informatics methods.

2. Core Pipeline Output files: These are produced from FASTQ files according to the pipeline described. There are five files included: counts (raw data), log2foldchange, log10pval, edgeR enriched (simple hits call file) and log2foldchange enriched (most quantitative).

3. QC Figures: Provide information on the phages produced and present in each assay, the mapped counts per sample and correlation matrices for controls and the samples in each study.

4. Figures summarizing the results from edgeR Enriched Hits: Contains figures describing the number of peptide hit enrichments in each sample by groups and with a correlation matrix.

5. Processed Hits Pipeline Output Files: Offers insight on peptide-level group hit counts and protein-level group analysis (polyclonal hit counts) as well as providing a list of the most variable publicly reactive peptides across the sample cohort.

6. Most Variable Peptides: Provides an unbiased hierarchical clustering of the top variant peptides.

7. Fisher Exact Tests (only for unblinded studies): Contain case versus cohort statistical outputs according to the provided study group.

8. Study Heatmaps: Heatmaps are created for all proteins containing a peptide represented in the top 100 variant peptides.

What types of reports are offered by CDI Labs Canada for PhIP-Seq?

We offer two types of reports for blinded and unblinded studies. Blind studies receive all the data described in the deliverables and a general report with unbiased figures. Unblinded studies can receive case-cohort comparisons for up to 12 individual study groups.

Does CDI Labs use any normalization for the data?

The data is Log2 transformed before using it for calculation Z scores.

Log2 is used because it aids in calculating fold change, which measures the up-regulated vs down-regulated signals between samples. Log2 measured data is also closer to the biologically-detectable changes. Log2 transformation is the most commonly used transformation for microarray data. This transformation stabilizes the data variance of high intensities but increases the variance at low intensities.

Is the data provided quantitative?

The data is semi quantitative. Although the data is not a titre per say of IgG antibodies, the number of read counts of a particular phage/hit can be used to look at fold changes and statistical significance.

What bioinformatics is included in PhIP-Seq services projects?

See below:

1. As part of the price for the service, the customer can choose up to 12 different statistical comparisons (Fisher Exact Tests). Results are output at both the peptide level and as simple protein-level and ‘polyclonal’ tests. One output file is provided for each unique studyGroup.

2. Peptide-level Fisher exact tests are performed on a 2x2 matrix of Hits and No_Hits columns for each provided group using the fisher.test function in R. Resulting p-value calculations are exported and saved in the fisher_exact_tests folder found in the deliverable stack of data provided to the customer.

3. projectname_top_public_peptides.csv – The most frequent publicly reactive peptides across the sample cohort; this file is output even for blinded studies.

4. Whole-protein epitope-scanning heatmap.

5. projectname_group_hit_counts.csv - Peptide-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to each peptide in the library. One Hit and No_Hit column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If more than two groups are provided, these are used for Fisher exact tests.

6. projectname_group_protein_hit_counts.csv - Protein-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to at least one peptide from a given protein in the library. One Protein_Hits and Protein_No_Hits column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If groups are provided, these are used for Fisher exact tests. Since this is a protein-level analysis, peptide-level metadata columns are not included.

7. projectname_group_polyclonal_hit_counts.csv - Protein-level group analysis. This is a true/false tally of the number of edgeR_enriched_log2foldchg.csv samples from each group that react to three or more peptides from a given protein in the library. One Polyclonal_Hits and Polyclonal_No_Hits column pair per provided group described in the studyManifest; only one pair will be present if the study groups are blinded. If groups are provided, these are used for Fisher exact tests. Since this is a protein-level analysis, peptide-level metadata columns are not included.

8. Individual Sample Barplots - These barplots are created from the edgeR_enriched_log2foldchg.csv file and represent the top x25 enriched fold change signals versus ‘beadsonly’ decoy controls for each individual sample. If fewer than x25 significant hits are available, fewer peptides are plotted. An example of one of these images is shown below.

GFAP Fold-change vs. Decoy

NOTE: In addition CDI Labs has bioinformatics experts on staff if more statistical analysis is required. This is not included in the price of the service and would be quoted on an individual basis once the scope of the work needed is understood.