Data di Pubblicazione:
2008
Abstract:
Inference for Expressed Sequence Tags (ESTs) data is considered. We focus on evaluating the redundancy of a cDNA library and, more importantly, on comparing different libraries on the basis of their clustering structure. The numerical results we achieve allow us to assess the effect of an error correction procedure for EST data and to study the compatibility of single EST libraries with respect to merged ones. The proposed method is based on a Bayesian nonparametric approach that allows to understand the clustering mechanism that generates the observed data. As specific nonparametric model we use the two parameter Poisson–Dirichlet (PD) process. The PD process represents a tractable nonparametric prior which is a natural candidate for modeling data arising from discrete distributions. It allows prediction and testing in order to analyze the clustering structure featured by the data. We show how a full Bayesian analysis can be performed and describe the corresponding computational algorithm.
Tipologia CRIS:
03A-Articolo su Rivista
Keywords:
Bayesian nonparametrics; clustering; EST analysis; species sampling; two parameter Poisson–Dirichlet process.
Elenco autori:
A. LIJOI; R.H. MENA; I. PRUENSTER
Link alla scheda completa:
Pubblicato in: