Machine learning algorithms for insightful analysis of complex data structures
Induction is a process of knowledge extraction from information contained in data. In our work we will concentrate on descriptive induction whose goal is to construct knowledge that enables human understanding of the data. Included are techniques for construction of user-interpretable models, segmentation of the corpus of examples, and detection of outliers. The methodology is relevant for the computer science fields known as intelligent data analysis, knowledge discovery from data, and data mining. At Rudjer Boskovic Institute we are developing machine learning algorithms for more than 15 years and we have successfully applied them in various domains including chemistry, biology, medicine, social sciences, economics, and manufacturing. With this project we want to extend the existing methodology and to implement novel techniques able to cope with data contained in complex structures. The main topic will be spatio-temporal structures but we will work also with networks of data, relational databases, and data contained in ontologies. Previous experience clearly demonstrates that complete transformation of information contained in structured data into a form that may enter the induction process is not a simple task. Our goal is to develop and implement systematic and general approaches for this conversion. The consequence will be explosion of generated data that must enter into the process of induction. Therefore, the second goal will be implementation of efficient algorithms for descriptive induction. The work includes development of novel algorithms for clustering and outlier detection in sets of unclassified examples and implementation of hardware for fast execution of rule learning algorithms. The third goal is application and evaluation of implemented algorithms in various real-life domains. The success of the complete project will be measured by the quality and usefulness of knowledge obtained in these applications.