Clustering results on the 507 dataset with our implementation of
GeneRAGE (Top Left), hierarchical clustering (Top Right), TribeMCL
(Bottom Left) and our Spectral Clustering algorithm (Bottom Right). The
figures show only the top 30 most populated clusters returned by each
algorithm and 8 for the spectral clustering, since it returned only 8
clusters. Each row in the diagrams corresponds to a different cluster.
Short (green) bars represent the assignment of each protein sequence to
a cluster. Each protein has one of these bars in only one of the rows
(clusters); the presence of the bar means that the protein is assigned
to that cluster. Boundaries between super-families are shown by
vertical thick (red) lines; boundaries between families within each
super-family are shown by dotted (blue) lines. The dataset has 6
super-families, orderly from left to right: Globin-like (88), EF-hand
(83), Cupredoxins (78), (Trans)glycosidases
(83), Thioredoxin-like (81), Membrane all-alpha (94). [from A.
Paccanaro, J. A. Casbon, M. A. S. Saqi (2006). Spectral
Clustering of Proteins Sequences Nucleic Acids Research
2006 Mar 17;34(5):1571-80].