IntroductionIn most nations, genetically engineered foods must be assessed for their safety before market approval is granted. An important issue in this safety assessment is the potential allergenicity of transgenic ("foreign") proteins that have been introduced into the food by genetic engineering. In other words, what is the chance that the foreign protein may cause allergic reactions after consumption of the genetically engineered food containing this protein?
Potential allergenicity is assessed during a step-by-step procedure described by the guidelines of the FAO/WHO Codex alimentarius Commission for the safety assessment of foods derived from genetically engineered plants and micro-organisms . One important step in this procedure is to determine, with the aid of computer programs, whether the primary structure (amino acid sequence) of the transgenic protein is similar to sequences of allergenic proteins, of which the latter are available from public protein sequence databases.
Two types of similarity are searched for:
To search for the two types of similarities, a recent Expert Consultation of the FAO/WHO, which was held in preparation of the Codex alimentarius guidelines, devised the following procedure :
6.1. Sequence Homology as Derived from Allergen Databases
The commonly used protein databases (PIR, SwissProt and TrEMBL) contain the amino acid sequences of most allergens for which this information is known. However, these databases are currently not fully up-to-date. A specialized allergen database is under construction.
Suggested procedure on how to determine the percent amino acid identity between the expressed protein and known allergens.
Step 1: obtain the amino acids sequences of all allergens in the protein databases (for SwissProt and TrEMBL: see http://expasy.ch/tools; for PIR see http://wwwnbrf.georgetown.edu/pirwww ) in FASTA-format (using the amino acids from the mature proteins only, disregarding the leader sequences, if any). Let this be data set (1).
Step 2: prepare a complete set of 80-amino acid length sequences derived from the expressed protein (again disregarding the leader sequence, if any). Let this be data set (2).
Step 3: go to EMBL internet address: http://www2.ebi.ac.uk and compare each of the sequences of the data set (2) with all sequences of data set (1), using the FASTA program on the web site for alignment with the default settings for gap penalty and width.
Cross-reactivity between the expressed protein and a known allergen (as can be found in the protein databases) has to be considered when there is:
1) more than 35 % identity in the amino acid sequence of the expressed protein (i.e. without the leader sequence, if any), using a window of 80 amino acids and a suitable gap penalty (using Clustal-type alignment programs or equivalent alignment programs)
2) identity of 6 contiguous amino acids.
If any of the identity scores equals or exceeds 35 %, this is considered to indicate significant homology within the context of this assessment approach. The use of amino acid sequence homologies to identify prospective cross-reacting allergens in genetically modified foods has been discussed in more detail elsewhere (Gendel, 1998a; Gendel, 1998b).
The search facility on the Allermatchtm webtool automatically carries out the procedure recommended by the guidelines on protein sequences that are entered by the user in FASTA format (one-letter code without residue numbers, see example sequence below). The user has the option to select the following outputs of interest:
The entered sequences will be compared to the sequences of allergenic proteins compiled in the database. These sequences of allergenic proteins have been extracted from protein databases. Putative signal-, pro-, and transit-peptides, whose positions are indicated by the protein source database accession as "features", have been removed from these sequences, which yields the sequences of "mature" proteins.
Positive results of the analysis will be provided to the user.