(A) Construction of a PI network based on protein copurification and
detection by mass spectrometry. The confidence scoring of the LCMS and
MALDI networks was conducted using a logistic regression with datasets
consisting of PI from low-throughput studies curated in DIP, BIND, and
IntAct (gold positives) and proteins in different subcellular
localizations (gold negatives). The two networks were integrated using
a probabilistic model [61] (Protocol S6). The resulting PI network,
with edge weights corresponding to likelihood ratios, was clustered
using MCL to delimit “multiprotein complexes.”
(B) Integration of four GC methods into a single functional interaction
network using the same probabilistic model [61] and resulting scores
(edge weights) were input to MCL to delimit “functional modules.”
(C) Orphan function prediction was conducted using a
“guilt-by-association” procedure. After integration of PI and GC
interactions into a single probabilistic network [61], a machine
learning algorithm (StepPLR) newly developed for this study was used to
assign functions based on the binary associations of orphans with
annotated proteins, the respective interaction edge weights, and the
overall network topology. Correlations between vectors of these
function predictions (orphans), and the annotations were then used as
input to delimit “functional neighborhoods” by clustering using MCL.
[from P.
Hu, S. C. Janga, M. Babu, J. J. D’iaz-Mej’ia, G. Butland, W. Yang, O.
Pogoutse, X. Guo, S. Phanse, P. Wong, S. Chandran, C. Christopoulos, A.
Nazarians-Armavil, N. K. Nasseri, G. Musso, M. Ali, N. Nazemof, V.
Eroukova, A. Golshani, A. Paccanaro, J. F. Greenblatt, G.
Moreno-Hagelsieb, and A. Emili Global
functional atlas of Escherichia coli encompassing previously
uncharacterized proteins PLoS
biology, vol. 7, iss. 4, p. 1000096, 2009].