Example of matching an input sequence

(Print this page so it can be consulted during the subsequent steps described below)

Input sequence
The following sequence is that of the mature protein of the allergen Zea m 14.0101 from maize pollen. As may be noticed, this sequence contains one-letter codes for each amino acid, while the complete sequence is made up of 93 letters or amino acids:

aiscgqvasaiapcisyargqgsgpsagccsgvrslnnaarttadrraacnclknaaagvsglnagnaasipskcgvsipytiststdcsrvn
While the original protein sequence in the UniProt database entry P19656 consisted of 120 amino acids, removal of the signal peptide comprising the first 27 amino acids has yielded this mature protein sequence containing 93 amino acids.

If users enter their own input sequences, numbers in this sequence should be removed, whereas spaces, paragraph- or line- returns, need not be removed. In addition, three-letter codes for amino acids, such as IleSerCys... (first 3 residues of Zea m 14) should be changed into one-letter codes, for example by using web-based conversion tools (for example, "Three-to-One").

Entering an input sequence and selecting the alignment of interest
Enter the input sequence, by typing or copy-pasting it, in the searchbox (below "Copy Paste your amino acid sequence here") of the Allermatchtm search page. With the cursor, select one of the following options:

In case the 80-amino acids sliding windows has been chosen, the default threshold value of 35% identity may be modified by the user in the box next to "Cutoff Percentage (only applicable to the 80 amino acids sliding window)". The threshold is the lower limit for alignments that will be displayed in the following steps (alignments scoring below the threshold will therefore not be displayed).

If the option for a small exact wordmatch has been chosen, the default value 6 for the wordlength can be modified by the user in the box next to " Wordlength (only applicable to the exact wordmatch search)". The wordlength is the minimal number of amino acids in an exact match.

After having selected the options and thresholds (if applicable) of interest, click then the "Go" button. The results will appear in the new page that is created in the same window on the user's screen. The various outcomes are discussed below for each of the specific options.

80 amino acids sliding window

Summary table

The new page that appears after starting the 80-amino acids sliding window alignment on the input sequence provides a table with a summary of the "hits", which are alignments scoring above the cutoff value. Each specific allergenic protein whose database sequence scored hits is presented in a new line, while data on this allergenic protein and the alignment are presented under the following column headings:

Detailed information

This page provides the following information:
By clicking the "Show all alignments" button in the all the separate hits, i.e. alignments of those 80-amino acid subsequences (windows) of the input sequence that scored equal to- or above- the cutoff value of 35% (fixed value, cannot be changed by the user), can be viewed. The new page that appears in the same window on the user's screen contains the same information as the previous page, in addition to the separate hits. After clicking "Hide all alignments", the previous page re-appears.

Example

For the input sequence Zea m 14 screened against the UniProt collection of the Allermatchtm database, for example, the summary table lists various database sequences of allergenic proteins that score hits if the cutoff value equals 35%. Since the Zea m 14 sequence contains 93 amino acids, 14 subsequences (windows) of 80-amino acids have been generated (1-80, 2-81, ...., 13-92, 14-93). The highest ranking database sequence in the table is Zea m 14 itself, because the same sequence has also been stored in the Allermatchtm database, which shows a best hit of 100%, while all of the 14 windows of the input sequence scored hits, as expected. One of the lower ranking sequences in the table is designated Par_j_2_a (Allermatchtm identifier), one of the two database sequences of the allergenic protein Par j 2 derived from weed pollen. The best hit for this sequence is 36.59% identity, while 5 of the 14 windows scored hits. The detailed information on the alignments with Par_j_2_a show that a large part of both the input and database sequence are part of the 80-amino acid sliding window- and full- alignments. Interestingly, many of the sequences listed in the table are lipid transfer proteins, as mentioned in the original external accession to which the table provides links.

Exact hits of small stretches of identical amino acids

Summary table

The new page that appears after starting the alignment of small identical stretches using WordMatch provides a table summarising the "hits", which are the alignments equal to- or above- the wordlength, i.e. the minimal number of identical contiguous amino acids. Each of the database sequences of allergenic proteins that showed a hit with the input sequence is shown in a separate line of the table, while the data on the allergenic protein are shown under the following column headings:

Detailed information

This page provides the following information on the hits of the selected wordlength with a specific allergenic protein:

Example

For the Zea m 14 test sequence, tested against the UniProt Allermatchtm database, the summary table mentions various database sequences of allergenic proteins, including Zea m 14 itself, if a wordlength of 6 is selected. Besides Zea m 14, the other database sequences include, among others, allergenic proteins that are classified as lipid transfer proteins. Among the low ranking database sequences are Pru av 3 and Pru ar 3 from cherry and apricot, respectively, each of which scored one hit. As can be inferred from the detailed information, the single identical stretch of 6 amino acids (acnclk) in Pru av 3 and Pru ar 3 is also present in some of the other listed database sequences.

Full alignment

The new page that appears after starting the full alignment contains the following information:

Example

If Zea m 14 has been entered as input sequence, the highest scoring database sequences are the same as for the 80-amino acids sliding window alignment, i.e. lipid transfer proteins, in addition to the database sequences of Par j 1, another lipid transfer protein.