This dissertation deals with the development of effective information processing strategies for distributed optimization and learning over graphs. The work considers initially global cost functions that can be expressed as the aggregate sum of individual costs (``sum-of-costs'') and proceeds to develop diffusion adaptation algorithms that enable distributed optimization through localized coordination among neighboring agents. The diffusion strategies allow the nodes to cooperate and diffuse information in real-time and they help alleviate the effects of stochastic approximations and gradient noise through a continuous learning process. Among other applications, the resulting strategies can be applied to large-scale machine learning problems, where a network of agents is used to learn a common model from big data sets that are distributed over the network.

The work also develops diffusion strategies for the solution of another class of problems where the global cost functions are expressed as regularized ``cost-of-sum'' forms. This situation arises when a large-scale model is stored and learned over a network of agents, with each agent being in charge of a portion of the model and it is not feasible to aggregate the entire model in one location due to communication and privacy considerations. It is shown that the ``cost-of-sum'' problem can be transformed into a ``sum-of-costs'' problem by using dual decompositions and the concept of conjugate functions. The collaborative inference step in the dual domain is shown to generate dual variables that can be used by the agents to update their model without the need to share these model parameters or the training data with the other agents. This is a powerful property that leads to an efficient distributed procedure for learning large-scale models over networks.

The dissertation carries out a detailed transient and steady-state analysis of the learning behavior of multi-agent networks, and reveals interesting results about the learning abilities of distributed strategies. Among other results, the analysis reveals how combination policies influence the learning process of networked agents, and how these policies can steer the convergence point towards any of many possible Pareto optimal solutions. The results also establish that the learning process of an adaptive network undergoes three well-defined stages of evolution with distinctive convergence rates during the first two stages, while attaining a finite mean-square-error (MSE) level in the last stage. The analysis reveals what aspects of the network topology influence performance directly and suggests design procedures that can optimize performance by adjusting the relevant topology parameters. Interestingly, it is further shown that, in the adaptation regime, each agent in a sparsely connected network is able to achieve the same performance level as that of a centralized stochastic approximation strategy even for left-stochastic combination strategies. These results lead to a deeper understanding and useful insights on the convergence behavior of coupled distributed learners. The results also lead to effective design mechanisms to help diffuse information more thoroughly over networks.

**Acknowledgment*** This work was supported in part by
NSF grants CCF-1011918, CCF-0942936, and ECS-0725441. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsor.*