Cyclopeptide sequencing walk-through

Let's start by assuming we know all the masses for the amino-acids that could be part of the peptide we are looking for:

57 71 87 97 99 101 103 113 114 115 128 129 131 137 147 156 163 186

In the "real world" we could also infer this set of masses through the spectrum convolution approach described in the chapter.

Now, let's take a perfect experimental spectrum and see how the algorithm works:

0 57 57 71 87 114 128 144 158 185 201 215 215 272

Step 1: Find the masses used by the peptide

Not all possible amino-acids can be used in our peptide, so it makes sense to cull the list of possible amino-acids before proceeding.  Since each amino-acid appears by itself as a fragment in the experimental spectrum, any masses not found in the experimental spectrum can be discarded. I'm marking here in bold the ones we'll keep:

57 71 87 97 99 101 103 113 114 115 128 129 131 137 147 156 163 186

Thus, the alphabet we'll use going forward is;

57 71 87 114 128

Note that this step can also be handled in the next round where we can assume check linear spectra of length-1 peptides. I find it more natural to do a first check, and doing it this way also makes it easier to modify the code later to include a spectral convolution module instead.

Step 2: Build the candidate peptide table

At this step we can simply create a candidate peptide table that contains all the masses we have identified - essentially a list of peptides each of length 1.

table = ['57', '71', '87', '114', '128']

Step 3: Expand the peptide table

Now we use the masses in our 'alphabet' to try to build longer peptides. This is as simple as trying to append to each peptide in our table each of the masses in the alphabet:

table = [ '57-57', '57-71', '57-87', '57-114', '57-128',
'71-57', '71-71', '71-87', '71-114', '71-128',
'87-57', '87-71', '87-87', '87-114', '87-128',
'114-57', '114-71', '114-87', '114-114', '114-128',
'128-57', '128-71', '128-87', '128-114', '128-128']

Step 4: Verify that the linear spectrum of the peptides in the table is consistent with experimental spectrum

At this stage we take each of the candidate peptides, compute its linear spectrum, then check if all the masses are found in the experimental spectrum.  For reference, here's the experimental spectrum again:

0 57 57 71 87 114 128 144 158 185 201 215 215 272

Since we are just looking at pairs of amino-acids, at this early stage the test is simply looking if the sum of the masses is found in the experimental spectrum. Below is the table again with all the incompatible peptides in bold:

table = [ '57-57', '57-71', '57-87', '57-114', '57-128',
'71-57', '71-71', '71-87', '71-114', '71-128',
'87-57', '87-71', '87-87', '87-114', '87-128',
'114-57', '114-71', '114-87', '114-114', '114-128',
'128-57', '128-71', '128-87', '128-114', '128-128']

Yielding the new table:

table = [ '57-57', '57-71', '57-87', '57-128',
'71-57', '71-87', '71-114', '87-57',
'87-71', '87-114', '87-128', '114-71',
'114-87', '128-57', '128-87']

Step 5: Repeat steps 3 and 4

Expand:

table = [ '57-57-57', '57-57-71', '57-57-87', '57-57-114', '57-57-128',
'57-71-57', '57-71-71', '57-71-87', '57-71-114', '57-71-128',
'57-87-57', '57-87-71', '57-87-87', '57-87-114', '57-87-128',
'57-128-57', '57-128-71', '57-128-87', '57-128-114', '57-128-128',
'71-57-57', '71-57-71', '71-57-87', '71-57-114', '71-57-128',
'71-87-57', '71-87-71', '71-87-87', '71-87-114', '71-87-128',
'71-114-57', '71-114-71', '71-114-87', '71-114-114', '71-114-128',
'87-57-57', '87-57-71', '87-57-87', '87-57-114', '87-57-128',
'87-71-57', '87-71-71', '87-71-87', '87-71-114', '87-71-128',
'87-114-57', '87-114-71', '87-114-87', '87-114-114', '87-114-128',
'87-128-57', '87-128-71', '87-128-87', '87-128-114', '87-128-128',
'114-71-57', '114-71-71', '114-71-87', '114-71-114', '114-71-128',
'114-87-57', '114-87-71', '114-87-87', '114-87-114', '114-87-128',
'128-57-57', '128-57-71', '128-57-87', '128-57-114', '128-57-128',
'128-87-57', '128-87-71', '128-87-87', '128-87-114', '128-87-128']

Cull:

table = [ '57-57-57', '57-57-71', '57-57-87', '57-57-114', '57-57-128',
'57-71-57', '57-71-71', '57-71-87', '57-71-114', '57-71-128',
'57-87-57', '57-87-71', '57-87-87', '57-87-114', '57-87-128',
'57-128-57', '57-128-71', '57-128-87', '57-128-114', '57-128-128',
'71-57-57', '71-57-71', '71-57-87', '71-57-114', '71-57-128',
'71-87-57', '71-87-71', '71-87-87', '71-87-114', '71-87-128',
'71-114-57', '71-114-71', '71-114-87', '71-114-114', '71-114-128',
'87-57-57', '87-57-71', '87-57-87', '87-57-114', '87-57-128',
'87-71-57', '87-71-71', '87-71-87', '87-71-114', '87-71-128',
'87-114-57', '87-114-71', '87-114-87', '87-114-114', '87-114-128',
'87-128-57', '87-128-71', '87-128-87', '87-128-114', '87-128-128',
'114-71-57', '114-71-71', '114-71-87', '114-71-114', '114-71-128',
'114-87-57', '114-87-71', '114-87-87', '114-87-114', '114-87-128',
'128-57-57', '128-57-71', '128-57-87', '128-57-114', '128-57-128',
'128-87-57', '128-87-71', '128-87-87', '128-87-114', '128-87-128']

Some notes here:

57-71-57 is invalidated because it contains two fragments adding up to 128 but the experimental spectrum only has one.

57-87-128 (underlined above) actually has the same mass as the parent mass and it gets eliminated because its circular spectrum is not exactly the same as the experimental spectrum. Same goes for 71-87-114,  and it's variants.

Note that as you go down the table (which is roughly sorted in order of the mass given the way it is constructed), the total mass of peptides becomes larger than the parent mass (the largest mass in the experimental spectrum), and it's easy to immediately discard those masses without constructing their linear or circular spectra.

To summarize, here's what's left of the table after culling:

table = [ '57-57-71', '57-57-87', '57-71-87', '57-87-71', '71-57-57', 
'71-57-87', '71-87-57', '87-57-57', '87-57-71', '87-71-57']

Another expansion and culling:

table = [ '57-57-71-57', '57-57-71-71', '57-57-71-87', '57-57-71-114', '57-57-71-128', 
'57-57-87-57', '57-57-87-71', '57-57-87-87', '57-57-87-114', '57-57-87-128',
'57-71-87-57', '57-71-87-71', '57-71-87-87', '57-71-87-114', '57-71-87-128',
'57-87-71-57', '57-87-71-71', '57-87-71-87', '57-87-71-114', '57-87-71-128',
'71-57-57-57', '71-57-57-71', '71-57-57-87', '71-57-57-114', '71-57-57-128',
'71-57-87-57', '71-57-87-71', '71-57-87-87', '71-57-87-114', '71-57-87-128',
'71-87-57-57', '71-87-57-71', '71-87-57-87', '71-87-57-114', '71-87-57-128',
'87-57-57-57', '87-57-57-71', '87-57-57-87', '87-57-57-114', '87-57-57-128',
'87-57-71-57', '87-57-71-71', '87-57-71-87', '87-57-71-114', '87-57-71-128',
'87-71-57-57', '87-71-57-71', '87-71-57-87', '87-71-57-114', 87-71-57-114']

Note that you now have 8 peptides that have the same total mass as the parent mass, and that have circular spectra compatible with the experimental spectrum.

Two masses: 71-57-87-57 and 87-57-71-57 have the  correct total mass but their circular spectrum is not compatible with the experimental spectrum, because they lack mass 114 (57 + 57) and have extra copies of masses 144 (for the first peptide) and 128  (for the second).

Thus, the algorithm will output:

57-57-71-87 57-57-87-71 57-71-87-57 57-87-71-57 71-57-57-87 71-87-57-57 87-57-57-71 87-71-57-57