Rotein were copied into a single PDB file. A total of

페이지 정보

작성자 Shayla 작성일 23-08-19 20:42

본문

Rotein were copied into a single PDB file. A total of 11,294 PDB files were produced, which together contained the 3D structures of 488,039 indel sites. In the final step, an Apache web server was setup on an IBM Pentium D computer, which links to all the necessary indel information and files stored in a local MySQL database. All of the above indel results are stored in two tables: [indel_pdb_summary] and [pdb_blast_alignment], as shown in Figure 2. The connection between the web serverPage 3 of(page number not for citation purposes)BMC Bioinformatics 2008, 9:http://www.biomedcentral.com/1471-2105/9/and the MySQL database was established through Perl and CGI.Comparative analysis of indels To demonstrate applications of the Indel PDB database, we utilized the indel data to investigate several indel features that include sequence composition, length distribution, secondary structure composition, and solvent accessibility. All of the analyses operated on a non-redundant set of indel sites, which were extracted from the original set of 488,039 indels by grouping together indel sites with the same start and end position on the same protein. The resulting non-redundant set contains 117,266 indel sites. The values required for each of the analyses were retrieved from the MySQL database using Perl scripts.The analyses of amino acid sequence and secondary structure composition were repeated on both the indel sites and the full-length indel-containing proteins (referred as indel proteins). Data obtained from indel proteins were treated as background values that were compared to the indel site data. Chi-square test was applied to evaluate if the differences between indel sites and indel proteins were significant. For instance, in the case of comparing the alpha-helix content (H) between indel sites and indel proteins (our samples), the percentages of residues that were H or non-H in both samples were calculated. Then a Chisquare test value was calculated and a P-value Capecitabine was assigned. The same process was repeated for the other secondary structures or the sequence compositions. Solvent accessibility was measured by (the number of water molecules in contact with a residue) multiplied by 10 or (residue water exposed surface in Angstrom)2, according to the DSSP program. Two sample t-test was applied to compare the differences of solvent accessibility between indel sites and indel proteins.Length distribution The indel and loop length distributions were modeled by the Weibull [23] and power law distributions. The Weibull distribution can be described by the function:S(x) = exp-(x/), ? 0, , > 0 where S(x) is the survival function, and and represent a scaling factor and a shape parameter, respectively. The double logarithmic transformation of the Weibull function was performed: log(-log(S(x)) = log(x) ? log() The survival function, S(x), is the probability that a variable X has a value greater than a number x. S(x) was calculated by dividing the number of indels with more than xFigure 2 A database schema for Indel PDB A database schema for Indel PDB.Page 4 of(page number not for citation purposes)BMC Bioinformatics 2008, 9:http://www.biomedcentral.com/1471-2105/9/residues by the PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/17139194 total number of indels, where x ranges from 1 to 49. If the Weibull distribution can accurately model the indel length distribution, the double logarithmic plot is expected to be linear. The Pearson correlation coefficient (r2) as implemented in MS Excel was used to evaluate t.