e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Sciences
The goal of the e-LICO project is to build a virtual laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences. The proposed e-lab will comprise three layers: the e-science and data mining layers will form a generic research environment that can be adapted to different scientific domains by customizing the application layer. The e-science layer, built on an open-source e-science infrastructure developed by one of the partners, will support content creation through collaboration at multiple scales and degrees of commitment---ranging from small, contract-bound teams to voluntary, constraint-free participation in dynamic virtual communities. The data mining layer will be the distinctive core of e-LICO; it will provide comprehensive multimedia (structured records, text, images, signals) data mining tools. Standard tools will be augmented with preprocessing or learning algorithms developed specifically to meet challenges of data-intensive, knowledge rich sciences, such as ultra-high dimensionality or undersampled data. Methodologically sound use of these tools will be ensured by a knowledge-driven data mining assistant, which will rely on a data mining ontology and knowledge base to plan the mining process and propose ranked workflows for a given application problem. Extensive e-lab monitoring facilities will automate the accumulation of experimental meta-data to support replication and comparison of data mining experiments. These meta-data will be used by a meta-miner, which will combine probabilistic reasoning with kernel-based learning from complex structures to incrementally improve the assistants workflow recommendations. e-LICO will be showcased in a systems biology task: biomarker discovery and molecular pathway modelling for diseases affecting the kidney and urinary pathways.