Practical by Marian Novotny,
2005
The aim of this practical is to do the analysis of a protein
structure,
in particular a structure validation, a fold comparison and a
functional annotation.
This practical should introduce a number of useful tools and handy
services that can help to get the most of the macromolecular structural
data. Some of the services you might have seen before (especially if
you attend the Molecular Bioinformatics X3 course or alternatively the
Molecular Biology X3), but this time they will come in different
context and a repetition is invaluable.
The practical is web-based, all the services we will use are accessible via your preferred web browser. We will also use Deep View Swiss-PdbViewer or Pymol to visualize structures. Please make sure you can find this program in your computer. In case you can not, please, contact one of the lab assistants. Some of the results will be delivered to you by email - check if you can reach your email box.
A part of the practical are also questions. Please, take notes and answers to questions to your favorite text editor so you can check them later with a help of one of the lab assistants.
We will work with protein X (PrX) from Creatura mystiska
and try to find out something about it. This protein does not exist in
real life. It was only designed for this practical and its
three-dimensional(3D) structure was modeled in our laboratory. (Just
because otherwise the practical would be too trivial. You know already
a lot about databases and various services and are able to do some
detective work). However, the protein was derived from existing
sequences, so it is not completely absurd and it is possible to make
qualified guesses about its function and biological importance.
There are many proteins around with a
known 3D structure, but with a very few data apart the structural data.
Therefore it is essential to learn to acquire as much information as
possible from the data we already have.
Let us start with the PrX protein's structure, or rather just with
the PrX protein's structure image.
Now, just by looking at the picture, try to answer following two
questions.
|
|
This practical is called Structure analysis, so we will be concerned
and work mostly with the structure of the so far mysterious PrX. You
can find the structure coordinates (ie the pdb file) for the PrX here.
Please, download the structure coordinates. You will need them
throughout the practical.
Before we start doing anything with the structure, we should do some
structure validation to get an idea of the quality of our structure.
There are many ways to do that, you might compare bond
lengths, bond angles and various other statistics with the database of
these statistics for already known protein structures to see if your
protein of interest looks any similar to these structures.
We will do just a very basic validation, we will look at the
Ramachandran plot of the PrX. The Ramachandran plot is a very good
indicator of a structure-determination quality, since a final protein
structure is not optimized to get a better Ramachandran plot.
Open your structure with Swiss-Pdb Viewer, select all residues and get
the Ramachandran plot (Ctrl+R). You can figure an identity of a residue
by pointing the mouse arrow on it.
|
|
Protein structures have usually a vast majority of the residues
within the major allowed area like in the high-resolution structure of calmodulin (1exr), while poor structures have many
residues outside these areas (see Figure below).
|
There are many servers that aim to do a fold-comparison. The fold-comparison is a structure alignment made between your structure and all the structures in the database a particular server is using. Different servers use different methods and different databases (or rather different subsets of the PDB) and consequently give different results. None of the servers is known to always give a better result than the others, therefore it is generally a good idea to always try more than just one server. A few years ago we did an evaluation of fold comparison servers, and if you are interested in the result, consider looking at this page.
Here, we will look at two of these servers, SSM and DALI.
Let us begin with DALI
that was one of the best servers in our survey. You will do a search
using the DaliLite v.3. Upload the coordinates for PrX, fill in your
e-mail adress and click 'Submit'. DALI sends results by
email and we have have to wait for them about 10 minutes (if you have
to wait
more than 20 minutes something is wrong).
Meanwhile, you can familiarize yourself with the DALI server...
Servers and
databases are only useful when they are maintained and frequently
updated. A user of any service should always check if the service is
still maintained or was already sidetracked.
Look at the DALI page and answer the following questions:
|
|
|
We will also submit our structure to the SSM server.
Before we do that, look on the webpage of the server and answer
questions:
|
|
Now, click on the 'Start SSM' button, change 'Source' of the Query from
'PDB entry' to 'Coordinate file' and upload the PrX structure. In the
'Target'
window you have a lot of options, but to stay on the safe side we can
keep the default setting (All PDB archive - biggest available
database). We will also keep default settings everywhere else and run
the job by clicking 'Submit your query'. Nevertheless, try to think
what effects can have, for example,
setting of 'lowest acceptable match' to 70%.
A list of hits should arrive almost instantly. Look at the Titles in
the list.
|
Find the explanation for each column in the list. |
Take a closer look at one of the hits, let's say 1PO6. The row for
each hit gives many statistics to describe how similar is your query to
the hit. You can also see any of the hits superimposed on the top of
your query protein and loads of links to other services and databases
are also provided.
|
|
Now, is the time to compare the results from both servers. If you
still have not received an email from the DALI server you can have a
look at a locally deposited copy of the results.
The DALI server reduces a number of hits by pruning the initial
database. It uses just structures that have sequence identity to each
other smaller than 25%.
|
Rather surprising answer, isn't it? One would expect less hits if
the database is smaller. Apparently, the size of the database is not
the only factor influencing the results. There have to be other things
to consider.
|
Spend some time with the DALI list of hits. It is similar to the SSM
list. It is showing some sort of a statistical significance score,
RMSD, a number of aligned residues, sequence identity and a brief
annotation of the hit. Note that the sequence identity is often very
low, well bellow the twilight zone. Let us pay a special attention to
to the column 'Description'. Do the hits help to guess the function of
our
protein, at lest at a very general level?
|
Let us explore the function of the protein a bit more in the last section of this practical. It seems that our protein is the most similar to heterogeneous nuclear ribonucleoprotein A1 (or helix-destabilizing protein or single-strand binding protein or hnRNP core protein A1 - maybe there are still more names) from Homo sapiens. That is a very long name, but doesn't say very much. Maybe you are an expert in cell biology and you already know all, but for the rest of us it will take some effort to find out more.
We can start at the PDBsum page of one of the heterogeneous nuclear ribonucleoprotein A1 structures, namely 1po6. The PDBsum is a database of known proteins and nucleic acids where we can easily find a basic information about the particular structure, but more importantly about the protein itself. There are also many links to other structural and sequence databases.
First of all, being incredulous scientists, we will do a quick validation of the heterogeneous nuclear ribonucleoprotein A1 structure (1po6). Find a link to EDS on the PDBsum page for the entry 1po6. EDS stands for the (Uppsala) Electron Density Server. The EDS works only for structures determined by X-ray crystallography. The main objective of the server is to show how well the structure built by the authors corresponds the data that they collected. One would expect a very good match, meaning that the structure shall be built according to experimental data. However this is not always the case. A protein can, for example, contain a mobile loop that has a few different conformations, but the structure deposited in the PDB is static and can contain just one of of these conformations. In such case, the EDS server will show a lower correlation between the structure and the experimental data (electron density) and person interested in the protein will know that she/he should not trust this particular part of structure to the last decimal place.
Click on "Real-space R-value" on the EDS page for the PDB entry
1po6.
You will get to a plot that shows a (dis)agreement between the model
built by crystallographer and the electron density for each amino acid.
The higher is the bar the bigger is the disagreement. The EDS also
enables to visualize the agreement graphically.
Click on the bar
for Arg 97, a viewer should pop up and centre on Arg 97. Where does the
amino acid lie (in the core?, on the surface?)? How well does it fit?
EDS has retired and some of the data might not be available again - if so, look in Pymol on 1po6 and identify where Arg97 lies - is it in the core? is it on the surface?
|
Please, go to the Interpro
database and browse by structure for the entry 1po6.
Interpro can answer what is shared in all the hits in the DALI output.
So, look at a domain composition of this protein (possibly look
also at the other hits).
|
We already know what is the most similar known protein with the
known structure to our protein of interest and we also domain
composition that protein. To get still more specific information we
will move to the curated, annotated Uniprot database to see what people
before us found out about
hnRNP core protein A1.
|
And finally, let's explore the interaction between the protein and
nucleic acid. This way we can better understand how is ssRNA or ssDNA
recognized by proteins. Download the PDB file for the 1po6 entry (there
are many ways to do that, by now you should at least one of them).
Open the structure in the Swiss-PDB Viewer and make sure that Control
panel is on. Select all the
nucleotides. Go to 'Select' menu and choose 'Neighbors of selected aa'
(never mind you selected nucleotides) and
pick 'Select groups that are within' from the menu. The default value
is 3.5 A, because 3.5 A is a maximum length of a hydrogen bond. Nucleic
acid - protein interactions are usually mediated by hydrogen bonds, so
3.5 A suits us. OK.
|
Visualize just DNA and the amino acids indicated in the interaction.
Add also hydrogen bonds (Tools - Compute H-bonds).
|
|
|
Now is the time to compare nucleic binding properties of our protein
with hnRNP core protein A1. We will align the sequences of both
proteins and check if the residues in hnRNP core protein
A1 responsible for binding ssDNA of are conserved also in our protein.
If you find most of the residues
conserved, then there is high probability that even our protein will
bind nucleic acid and possibly also have similar function.
At last, here comes the sequence of our protein.
>Protein "X" KRPDQLGKLFIGNLSFQTSDESVRQHFEQWGEITDSIVMKDKNTGRSRGYGFVSYAPVED VTAIMNARLHLLDGNVIEKKRKVSVEDNQRPVKKLFIRGIKESTTEEDLKEYFSE YGDIELLEIVTDHASGKTRGFGFVTFDDKDTVMKLVINRYHIVNGHQCEARLALSRQEMA SAS |
You know how to get a sequence of hnRNP core protein A1 and you can
align them at any pairwise sequence alignment server, e.g here.
|
Congratulation, you have survived. In this exercise, you have seen and
tried to use a few services that can provide hints about function and evolution of a protein while working with
protein 3D structures.
Comments, questions forward to Marian Novotny.
Last updated 11th of April 2018.