MubiSiDa: a comprehensive and interactive database of for investigating the in-depthinformation of experimentallyverified protein ubiquitination site in mammal

Xiaofeng Song ,Tong Chen, Bing He

前面的引导什么的,循循善诱的东西,可以参考

Abstract

MubiSida, mammalian ubiquitination site database,provides a scientific community with a comprehensive,freely and high-quality accessible resource of mammalian ubiquitination.This searchable systems biology database provides information,resources and tools for the study of mammalian ubiquitination.As the largest online database aggregated mammalian protein ubiquitination, MubiSiDa was designed to be a widely used tool for biologists and biomedical researchers with user-friendly interface. The home page provides two sorts of search functionalities, quick search and advanced search. And restrictions of search terms are different in each kind of search. Though search functionality, exact results can be obtained and primary information of the results can be customized according to personal willing. Users can also browse data by organism or three types of Gene Ontology. Detail information such as basic property of proteins ,ubiquitined sites and full sequence is available to be scanned in MubiSiDa.

Introduction

Ubiquitination,first known as its function of targeting proteins for subsequent degradation by the ATP-dependent ubiquitinproteasome system, plays an important role in post-translational modification (PTM[1]) of proteins. In the process of further research, many regulatory functions of ubiquination were discovered including the regulation of DNA repair and transcription, control of signal transduction, and implication of endocytosis and sorting. According to some certain theories, there is sufficient evidenceshowing that ubiquitination have been implicated in a few kind of disease(4 2 , 5 3) . in cellular process. All of the above discoveries have aroused attention from scientists for its great significance to cell research.With the growing number of the experimentally confirmedubiquitination sites, especially those of the mammalian protein, a database that comprehensively and systematically collect mammalian protein ubiquitination sites is strongly needed.A few databases like Ubiprot also provides some retrieval functions on ubiquitination, but either data amounts collected by these databases are too small or the data are collected from yeast databases. Most databases launched in the past do not satisfy the requirement of researching mammalian protein ubiquitination,

Most data are collected from published papers in UniPROT,NCBI and other international famous databases. In total, 88 references, including experimentally validated34853 mammalian proteins have been incorporated , with curated detail informationincluding the basic properties of ubiquitinated proteins, gene ontologies and Sequence annotation. MubiSida will be updated continually after its launch. The last version of MubiSiDa contains more than XXXX ubiquitination sites on 34853 proteins from 10 species. Over 95% of these sites are fromhuman and mouse.These kind of comprehensive data enable not only the information retrieval about ubiquitination site, but also thestudy of cross-regulation betweenpost-translational modification(24,35). In addition, with the data aggregated systematically by MubiSida , it is convenient for biologists to summarize principles and regulations of the formation of ubiquitination site. Furthermore, researchersparticipate in prediction ofubiquitination sites also benefitfrom the statistical information of MubiSiDa.

Database construction

Data source

The entire data of MubiSiDa are importedhierarchicallyand manually in order to ensure the accuracy of each data.Data importis broadly divided intothree steps. Firstly, a few well-known international databases like UniPRO, Ubiprot and NCBI were searched for some published research articles with a group of keywords like ubiquitination and ubiquitylation. International journalslike Science are also retrieved. All useful results were carefully examined to obtain the preliminarycollection of ubiquitination sites. Secondly, all collected ubiquitination sites and corresponding protein name were listed. But those data without adequate experimental evidence of substrate ubiquitinationwas eliminated. Thirdly, after obtaining all efficient and experimental-supported ubiquitination sites, detail information of each protein was searched from UniPROT, EBI and NCBI database. Approximately 13% of sequencesin MubiSiDacome from UniPROT KB, 7% from NCBI and80% from EBI. All sequences will besynchronizedwith corresponding databases every year.

Database construction method

MubiSiDa was constructed and configured upon a typical WAMP (Windows+ Apache + MySQL + PHP) platform.Dataset was stored in MySQL 5.0, and web interface was achieved by PHP scripts (PHP version 5.2) on Windows7, powered by an Apache server.

Database utilityand illustration

This section briefly describes the utility and illustration of MubiSiDa, aiming to help the users to get a general understanding of the database and use it more conveniently and efficiently. Detail navigational introduction is available on the Tutorial Page (

Homepage

Homepage is the entrance of all functionalitiesof MubiSida. As aninteractive, comprehensive and user-friendly interface, Homepage totally provides two sections of searchfunctionalities (‘Quick Search’and ‘Advanced Search’), one ‘Data Browse’ section, two information section (‘What’s New’ and‘StatisticOnUbiquitination’)and one’Link And Downloads’ section.

Three types of advanced search pages are provided by Advanced Search section and four types of browse interfaces are available in’Data Browse’ section.The functionalities of search and browse are important guaranty for making MubiSiDa an powerful and efficient tools.

‘What’sNew’section posts about thelatest newsor notifications of the database, such as newly published papers about MubiSiDa, update of the database version and the improvement of search or browse functionalities.

The ‘LinksAnd Downloads’ section supplies links to a few renowneddatabase for prediction of lysine ubiquitination and retrieval of enzymes involved in process of ubiquitination. More resources are listed in Resources Page ( when clicking the hyperlink on the right side.

The ‘StatisticOnUbiquitination’ section with two pie charts shows data distribution more clearly and intuitively. Additional annotations give the statistical significance of these two pie charts. Though this section, users will have a general understanding of the data.

Three tabs in the upper right-hand corner linking to three pages, Feedback, Resources and Contact.These three interfaces furnish anexchange platform for users to participate in the improvement of MubiSiDa and exchange their thoughts or suggestions withdevelopers of the database. In addition, the black navigation baron the topmost of Homepageensuresthat users have an access to use other functions like viewing all proteins aggregated in the database.Not only that, but users can submit new information about ubiquitinated proteins by entering into the Submission Page..

Search function

Three steps are needed in the process of obtaining accurate data in any type of browse and search functionalities. Firstly, input the query string according to individual needs in the Search page. Secondly, after opening Result pages with a list of items matching the query string, the usersare able to view primary information about the protein. Ddetail information by clicking the ID of the protein on the left most columncolumn. Finally, detail information of the protein selected by users will be scanned. Each ubiquitination sites are experimentally proved and have corresponding PubMed of reference.Besides, for users’ convenience, on each Result Page, MubiSiDa provides users with customize functionality to reorganize the information of protein according to personal willing.

Quick search

Quick search are initiated in the search bar by inputting query string that is restricted to the name of a protein, gene names, organism, and ID (Uniprot ID, IPI ID, NCBI ID).The Result page will list all proteins that match the query string. Basic information of the proteins will be shown and those keywords matchingwhat inputted in the search bar inprevious interface will be coloredby bright color. For further retrieval, federated search is also available but at most two query strings combined by ‘+’ are supported. For instance, search range will be dramatically narrowed and search time will decrease if the users not only input part of the protein name, but also input the name of organism combined with’+. Quick Search is only suitable when users have sufficient information about the proteins. More complex federated search is provided in Advanced Search section.

Advanced search

Advanced Search section includes three types of search functionalities, (1) Advanced Search (2) Protein Name Search, (3) Sequence Blast.

  1. In Advanced Search interface, there six select boxes and each of them provides 8 search fields. Another five select boxes are available and each of them provides 3 conjunction, ’AND’, ’OR’ and ’BUT’. In this way, different kind of search terms can be combined freely. Though this sort of federated search, users are able to use part or all eight select boxes with conjunctions to combine each of search fieldand limit their search range.
  2. In Protein Name Search interface, there are four textareas for users to input query strings. Similarity, part or all of the textareas can be used for federated search, but four empty textareas fail to generate any search result. The method of the search was divided into two cases. One case is that the ID textarea is empty. Under this circumstance, search method of the other three textarea was designed to be fuzzy query. The other case is that the ID textarea is not empty. Under this circumstance, the search method of ID was set to be precise search .ID is supposed to be UniPROT ID, IPI ID or NCBI ID. UniPROT ID and IPI ID is strongly recommended.
  3. In Sequence Blast interface, there are two textareas including one area for inputting user’ s blast sequence and the other for demonstration by example. Asa tool for comparing primary biological sequence information,Sequence Blast enables a user to compare a query sequencewith all sequences of MubiSiDa, and identify sequences of MubiSiDa that resemble the query sequence above a certain threshold. By means of this method, one or few results whose sequence is most similar to the input sequence will be obtained after blast process.The Sequence Blast is a high efficient toolfor retrieving possible ubiquitinated lysine sites of proteins input by users. Not only the ID of the protein ,but also relating reference will be posted after using blast. Uses can scan detail information of the result by clicking the ID of the proteins.

Data Browse

Data Browse section includes four types of browse functionalities (1) Browse Data By Organism, (2) Browse Data By Biological Process(3)Browse Data By Cellular Component (4)Browse Data By Molecular Function.

  1. In Browse Data By Organism interface, the total data of MubiSiDa are grouped by organism. The left column shows total organisms collected by MubiSiDa and the right column showstotal correspondingnumber of the organisms of each kind. Click on each number on the right column, it opens a list of all associated records. This page is quite useful when users are willing to scan thosesmall amount of data from a certain type of species.
  2. In the three remaining Data Browse interfaces,the total data of the MubiSiDa are divided into groups by three categories of Gene Ontology(GO). Here take Browse Data By Biological Process interface as an example for the methods of these three interfaces are the same. A table will be generated after entering into this page. Total gene ontology IDs classified as Biological Process aggregated by MubiSiDa will be listed on the left column and corresponding gene ontology names will be shown on the right column. Choose and click one gene ontology ID, all proteins including selected gene ontology ID will displayed. The matched results supply users with a convenient way to scan proteins grouped by GO. And those scholars, who do cell research on regulation of protein’s function or biological processes, may benefit from this functionality.

Information Page

Users are allowed tohave an access to detail information of the data when clicking the hyperlinks on the left most column in Result Page. The information page aggregated examined and relativel y detail information about the ubiquitinated protein. All information was carefully checked manually to ensure the accuracy of each data. The Information Page is divided into six sections. The topmost two sections that includes‘Names and Origin’ section and ‘Protein attribute’section provide basic information of protein properties. The Ontologies section mainly contains data of Gene Ontology (GO), which was chiefly downloaded manually from UniPROT. The core data of MubiSiDa is shown in ‘Sequence annotation(Features)’section that includes whole information about ubiquitination site. In this section, alllysine ubiquitination sites in ascending order are listed. And on the rightmost column, colored PubMed hyperlinked to Reference section is the strong evidence of the existence of ubiquitinationfor each of the referenceturns out to be experimental proof. In theSequence section, all lysine ubiquitination sites were labeled and colored by red in full sequence. This makes it easy for users to recognize all sites.

Results And Discussion

MubiSiDa, the largest and most comprehensive mammalian ubiquitination site database, aiming to systematically aggregate all experimentally proved ubiquitinated protein. As a tool designed to be used by biologists and researchers, MubiSiDa will be continually improved and updated to ensure theconvenience and utility of the service, accuracy of the information and innovative in style and functionality. Details need to be done including beautifying the interface of MubiSiDa, improving efficiency in search process and adding new style of search to satisfy different needs.Functionality of prediction of mammalian ubiquitination sites will be subsequently added to MubiSiDa in the future version.With constantly improvement, MubiSiDa is expected to make a contribution to researching on regulation of protein’s function or biological processes.