Digital Term Papers Term Papers Count: 63,000
    Home     |     Join     |     Login     |     Logout     |     Forgot Password     |     FAQ     |     Contact
Search
   for:      
Term Paper Categories
American History
Anatomy
Physiology
Animal Science
Anthropology
Architecture
Arts
Astronomy
Aviation
Beauty
Biographies
Book Reports
Business
Computers
Creative Writing
Current Events
Economics
Education
Engineering
English
Environmental
Ethics
European History
Foreign Languages
Geography
Government
Politics
Health
History
Human Sexuality
Legal Issues
Marketing
Mathematics
Medicine
Miscellaneous
Movies
Television
Music
Mythology
Philosophy
Physics
Poetry
Political Science
Psychology
Religion
Science
Shakespeare
Social Issues
Sociology
Speech
Sports
Recreation
Supernatural
Technology
Theater
Zoology

Term Papers on BioinformaticsIntroduction

Term Paper TitleBioinformaticsIntroduction
# of Words4566
# of Pages (250 words per page double spaced)18.26

BioinformaticsIntroduction
            We take bioinformatics to mean the emerging field of science growing from the application of mathematics, statistics, and information technology, including computers and the theory surrounding them, to the study and analysis of very large biological, and particularly genetic, data sets. The field has been fueled by the increase in DNA data generation leading to the massive data sets already generated, and yet to be generated, in particular, the data from the human genome project, as well as other genome projects.


            Bioinformatics does not aim to lay down fundamental mathematical laws that govern biological systems. Instead, the use of mathematics in the field is in the creation of tools that investigators can use to analyze data. One of the most important uses for it is the statistical analysis of the similarity between two or more DNA or protein sequences.


Background Biology


            Deoxyribonucleic acid (DNA) is the basic information macromolecule of life. It consists of a string of nucleotides, in which each nucleotide is made up of a standard deoxyribose sugar and phosphate group unit, connected to a nitrogenous base of one of four types: adenine, guanine, cytosine, or thymine (abbreviated as A, G, C, and T respectively). The sequence in which the different bases occur in a particular strand of DNA represents the genetic information encoded on that strand. In the cell, DNA is organized into chromosomes, each of which is a continuous length of double stranded DNA that can be hundreds of millions base pairs long. A human chromosome consists mostly of “junk DNA,” whose function, if any, is not well understood. Interspersed in this junk DNA are genes, the classic unit of genetic information.


            A protein is comprised of a sequence of amino acids, which are represented by letters a, b, c, and so on. There are twenty amino acids that commonly appear in proteins. Proteins go on to perform a variety of functions in the cell, covering all aspects of cellular functions from metabolism to growth to division.


Basic Probability and Probabilistic Models


Some basic results in using probabilities are necessary for understanding sequences. A probabilistic model is one that produces different outcomes with different probabilities. A probabilistic model can simulate a whole class of objects, assigning each an associated probability. In bioinformatics the objects are often sequences and a model may describe a family of related sequences.


Consider an extremely simple model of any protein or DNA sequence. Biological sequences are strings from a finite alphabet of residues, generally either four nucleotides or twenty amino acids. Assume that a residue occurs at random with probability qa, independent of all other residues in the sequence. If the protein or DNA sequence is denoted x1…xn, the probability of the whole sequence is then the product:





Maximum Likelihood Estimation

            The parameters for a probabilistic model are typically estimated from large sets of trusted examples, often called a training set. For instance, the probability qa for amino acid a can be estimated as the observed frequency of residues in a database of known protein sequences, such as SWISS-PROT, where the frequencies for the twenty amino acids are obtained from counting up some twenty million individual residues in the database. As long as the training sequences are not systematically biased towards a peculiar residue composition, it is expected that the frequencies to be reasonable estimates of the underlying probabilities of our model. This way of estimating model is called maximum likelihood estimation. When estimating parameters for a model from a limited amount of data, there is a danger of overfitting, which means that the model becomes very well adapted to the training data but will not generalize well to new data.Conditional, Joint and Marginal Probabilities
             Suppose there are two dice, D1 and D2. The probability of rollin...

This is ONLY a preview of the article. If you would like to view the entire document, you must subscribe to Digital Term Papers. Please register below now!

Digital Term Papers has over 63,000 essays, term papers, and book notes online. Many paper sites will charge you hundreds of dollars for a single paper. Digital Term Papers only charges $14.95 for a one month membership with instant account activation!

Don't waste anymore time! Join NOW!!!

1 Month (automatic renewal) ($14.95)
3 Months (automatic renewal) ($29.95)
6 Months (one-time billing) ($39.95)

Pay by: