Where does the amino acid sequence start and where does it end? Can you describe relevant sequencing techniques?
The terminal amino acid sequences are important for the biological functions of a protein. They influence the protein distribution to different cellular locations, and the protein degradation and turn-over rate [1-3].
For recombinant proteins, it is thus important to confirm that the N- and C-termini are as predicted from the gene. You also need to confirm that the protein expression is correct. In pharmaceutical proteins, it becomes critically important to confirm that the protein termini are correct and identical from batch to batch [1, 2].
Often observed variations in the termini are incomplete processing of the N-terminal signal sequences. Other variations are lack of removal of the N-terminal methionine residue, and partial or complete formation of pyroglutamic acid from N-terminal Glu and Gln residues [3, 4].
Common variations of the C-terminus are sequence truncations, for example removal of the C-terminal Lys residue of the heavy chain in monoclonal antibodies. These variations of the protein termini may occur with differences in fermentation conditions and protein purification buffers. Therefore, N- and C-terminal sequencing is an important quality control analysis . It is also a mandatory in the ICH Q6B Guidance on test procedures for biotechnological products .
You can obtain confirmation and identification of exact N-and C-terminals of a pure protein by a combination of the following techniques:
- N-terminal Edman degradation
Protein sequencing by cyclic Edman degradation typically can identify 5-30 residues from the protein N-terminus. The Edman chemistry does not work on modified residues blocked for the PITC coupling. E.g. because they contain N-terminal pyroglutamic acid or acetylation. Edman sequencing requires 2-10 micrograms of the pure protein in-solution or on a PVDF membrane blot .
- Top Down sequencing by MALDI ISD
Mass spectrometric sequencing can be obtained by fragmenting the intact protein. This typically confirms 20-80 residues from each end by accurate masses and mass differences. The technique can fragment and sequence both the N- and C-terminal in the same mass spectrum.
It is possible to detect modifications like pyroglutamic acid, acetylation, truncations, signal sequences, leading Met residues by their specific masses. Top-down sequencing by MALDI ISD requires 20-100 micrograms of the purified protein in solution .
- Mass spectrometric peptide mapping by protease cleavage and LC MS/MS analysis
Enzymatic cleavage of the protein followed by LC MS/MS analysis and correlation of the peptide data with the expected amino acid sequence. This typically gives a sequence coverage from 30-90%.
The ionization efficiency in the mass spectrometer is much higher for most peptides than for the intact protein. Therefore, you can detect different peptides aat much higher sensitivity.The peptide sequence coverage is rarely 100%. This is because some peptides may be lost in sample preparation or not detected by the LC MS/MS.
Consequently, you cannot be completely sure that the terminal peptides are found.
The peptide mapping software should also consider possible terminal modifications. In addition you must allow unspecific cleavage rules for the protease, in order to detect the terminal peptides.
You may use a combination of LC MS/MS peptide mapping with 2 or 3 different proteases to increase the sequence coverage. This is also a good way to confirm the observed terminal peptides .
Due to the inherent features and limitations of these 3 techniques it is advisable to combine the N- and C-terminal analyses with intact protein Mw determination .
Alphalyse has expertise in N- and C-terminal sequencing from a variety of projects.
Please see details of the analyses at our website and contact us if you are interested in discussing N- and C-terminal sequencing of your protein.
 Jacob et al: “A tale of two tails: why are terminal residues of proteins exposed?“, Bioinformatics, 2007
 Reusch, W.: “Peptides & Proteins“, Michigan State University Department of Chemistry, 2013
 Lange et al: “Protein TAILS: when termini tell tales of proteolysis and function“, Current Opinion in Chemical Biology, 2013
 von Heijne, G.: “Signal sequences. The limits of variation.“, Journal of Molecular Biology, 1985
 European Medicines Agency: “ICH Topic Q6B Specifications: Test Procedures and Acceptance Criteria for Biotechnological/Biological Products“, 1999
 Pham et al: “High-throughput protein sequencing“, Analytical Chemistry, 2003
 Demeure et al: “Rational Selection of the Optimum MALDI Matrix for Top-Down Proteomics by In-Source Decay“, Analytical Chemistry, 2007
 Choudhary et al: “Multiple enzymatic digestion for enhanced sequence coverage of proteins in complex proteomic mixtures using capillary LC with ion trap MS/MS“, Journal of Proteome Research, 2003
 Heck et al: “Investigation of intact protein complexes by mass spectrometry“, Mass Spectrometry Reviews, 2004