|
||||||||||||||||
Research Ph.D. ThesesMining Subspace and Boolean Patterns from Data
By Lizhuang Zhao
We set up a systematic framework for mining subspace and boolean patterns from data. Subspace patterns are extracted from real-valued datasets, whereas Boolean patterns are mined from binary-valued datasets. For real-valued datasets, we mainly consider scaling and shifting relationships, whereas for binary-valued datasets, we mine arbitrary boolean expressions (OR, AND, CNF, DNF). Boolean patterns are also used to mine redescriptions, i.e., to describe the same group of objects in multiple ~Sorthogonal ways~T. The subspace pattern mining has been tailored to gene microarray data clustering to find biclusters and triclusters. We present two novel deterministic clustering algorithms: MICROCLUSTER and TRICLUSTER (the first 3D microarray subspace clustering), which can mine arbitrarily positioned and overlapping biclusters/triclusters. Depending on different parameter values, they can mine different types of clusters, including those with constant or similar row/column values, as well as scaling and shifting expression patterns. Optionally, the two algorithms merge/prune some clusters having large overlaps. We also give a useful set of metrics to evaluate the clustering quality, and show their effectiveness on real microarray expression data. We also present a comprehensive framework, BLOSOM, for mining boolean patterns from binary-valued datasets. This framework forms the algorithmic core of a redescription mining solution. We organize the space of boolean expressions into four categories; pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. Further, the closed generators form a lossless representation of all possible boolean expressions and, hence, all possible redescriptions. BLOSOM efficiently mines several forms of frequent boolean expressions by utilizing a number of methodical pruning techniques. Return to main PhD Theses page |
||||||||||||||||
|