Summary-based Comparison of Data Quality across Public MAGE-ML Genomic Datasets

Authors

  • Lorena Etcheverry Instituto de Computación, Facultad de Ingeniería, Universidad de la República
  • Mariano P. Consens University of Toronto

Keywords:

XML, data quality, mage-ml, functional genomic data standards and public collections, schema evolution

Abstract

Extensive microarray experimental data is available online, facilitating independent evaluation of experiment
conclusions and enabling reuse. Numerous microarray experiment datasets are published using the MAGE-ML
XML schema but assessing the quality of published experiments still represents a challenging task since there is no
consensus among microarray users on a framework to measure datasets quality.
In this paper, we apply techniques based on DescribeX that quantitatively and qualitatively analyze MAGE-ML
public collections, gaining insights about schema evolution. Our case study shows that DescribeX is a useful tool for
the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of
MAGE-ML datasets and its evolution.

Author Biography

  • Lorena Etcheverry, Instituto de Computación, Facultad de Ingeniería, Universidad de la República
    Teacher assistant, Instituto de Computación

Downloads

Published

2011-08-10