Adriano Barra
Last Update  18/05/2022
  • Home
  • Publications
  • CV
  • Teaching
    • Alumni
    • Modules >
      • Matematica Applicata
      • Meccanica Razionale
      • Sistemi Dinamici >
        • MMM_Andreucci
        • MMM_Cirillo
      • Neural Networks & Learning Machines >
        • English Version: NN&LM
  • Contact
  • Guerra80
  • MIA laurea
    • Primo Anno MIA
    • Secondo Anno MIA
    • Tesi Magistrale

​                                    Neural Networks & Learning Machines


​​
Picture
​Note: the course will occupy a small time slot (just over a month) and  lessons will take up a total of 30 hours for a value of 4 CFU.

Advanced Course for the PhD

Following a historical perspective, the course analyzes  mathematical models and methods related to the spontaneous information processing capabilities shown by networks of neurons (biological or artificial, once suitably stylized). After summarizing key concepts from statistical mechanics, stochastic processes and statistical inference, the course starts analyzing main models for the emission of an electrical signal by a single neuron. Then we will study how these interact in simple neural architectures, analyzing both the statistical learning capabilities that these networks enjoy as well as their retrieval skills. In particular, due to the Nobel Prize in Physics awarded in 2024 to John Hopfield and Geoffrey Hinton for their pioneering studies on neural networks, particular emphasis will be placed on their contributions and on the close connection that exists between them. The methodological leitmotif will be the statistical mechanics of complex systems (i.e. Parisi's theory, Nobel Prize in Physics in 2021) with its associated package of observables and typical tools (replicas, overlaps, etc.).​

                                   MATERIAL PRODUCED DURING THE COURSE

LECTURE  ONE:  03 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 01, first part:  video  
Lecture 01, second part: video
Lecture 01, material: a) blackboard notes,  b) hand-written notes c) slides for the intro

Arguments of Lecture One:
a broad historical introduction (AI in the last 100 years) and what is  actually happening now.
                                                entropy: a physical perspective (entropy from Boltzmann to Gibbs).
                                                entropy: a mathematical perspective (entropy from Shannon to Jaynes). 
                                                the Erhenfest (urn) model: statics and dynamics, study of the entropy in both the scenarios. 
                                                constrained Gibbs entropy by quadratic cost functions: physicist's free energy and reductionism. 
                                                constrained Gibbs entropy by Lagrange multipliers on mean and variance: the Gaussian world. 
                                                                             

LECTURE  TWO:  05 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI

Lecture 02, first part:  video 
Lecture 02, second part: video
Lecture 02, material   a) blackboard notes,  b) hand-written notes  
Note: the example shared at the end regarding deterministic chaos does not belong to the program and you can deepen it here.

Arguments of Lecture One: temperature in Physics = noise in Machine Learning. Fourier equation as the limit of random walk.
                                                Shanonn-McMillan sequences: e.g. [+1,+1,....+1,+1] vs [+1, -1, ... , +1, -1] entropy of sequences.
                                                free energy in Physics: a bridge between microscopic description  and the macroscopic observables   
                                                free energy in Mathematics: searching the minima of the cost function under the Liebnitz principle
                                                One-body models: response function, factorized structure of the Gibbs weight.
                                                Two-mody models: the Curie-Weiss model and its heuristic derivation.
                                      
                                          

LECTURE  THREE:  07 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 03, material: 
the handwritten notes are those of the previous lecture.
Lecture 03, summary:  this has been a technical lecture on mathematical approaches alternative to heuristic methodologies to solve for the free energy of given cost functions. We saw two techniques:
1) Guerra interpolation (for the free energy and both its quantifiers, the first momentum -the magnetization- and the second momentum -the susceptibility) and
2) Hamilton-Jacobi PDE (free energy in statistical mechanics = action in analytical mechanics: it obeys a HJ PDE and can be solved by that technique. The magnetization obeys a Burgers equation that collapses on the Riemann-Hopf PDE proving that symmetry breaking in Theor. Phys jargon is Hop bifurcation in Math. Phys. jargon).    


LECTURE  FOUR:  10 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI

Lecture 04, material: a) hand-written notes b) slides for the biological neuron modeling
Lecture 04, summary:
  we introduced the Hopfield model as a paradigm for associative memory and pattern recognition when modeling biological neural networks. We discussed the genesis of the Hebbian synaptic coupling and solved the model in the low storage regime. As a sideline we get acquinted with the Guerra-Toninelli argument to prove the existence of the asymptotic limits of thermodynamic observables.
We saw two techniques:
1) heuristic derivation of the self-consistency for the Mattis magnetization (i.e. the "signal") in the Hopfield model at low storage.
2) Guerra interpolation (for the free energy of the Hopfield model and its extemization to achieve the self-consintency for Mattis signal).

LECTURE  FIVE:  12 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 05, material: hand-written notes
Lecture 05, summary:
we introduce the Sherrington-Kirkpatrick model as the harmonic oscillator for complex systems. We discussed the observables that come into play, their role and their meaning. We derived the mean expectation of the cost function, the annealed expression for the free energy, that is a bound for the quenched one due to Jensen inequality and we obtained the replica-symmetric expression of the quenched free energy via Guerra interpolation.
We saw three techniques:
1) Signal-2-noise to infer the scaling with the neural volume of the maximum pattern storage
2) Hamilton-Jacobi PDE for the Hopfield model in the low storage and Burgers/Rieman-Hopf PDE for the Mattis signal-
3) Guerra interpolation (for the free energy of the SK model and its extemization to achieve the self-consintency for the overlap, RS level).


LECTURE  SIX:  14 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI

Lecture 06, material: hand-written notes, slides for the outro
Lecture 06, summary: 
we introduce the Hopfield model in the high storage regime, namely in the scenario where the network built of by N neurons has to face with a linear amount of patterns P=aN.  We rely upon Guerra interpolation and we need the knowledge of both the CW and the SK limits. Such a technique is one of the most used nowadays to produce phase diagrams of neural networks.
Once the Hopfield paradigm will be completely understood (at the RS level of description) we will enlarge our perspective on pattern recognition and associative memories by investigating different variations on theme.



LECTURE  SEVEN:  17 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI

Lecture 07, material: 
-build a neural network that solves the travelling salesman problem link to paper
-slides on the biological neuron model link to slides
-slides on variations above the Hebbian theme link to slides

​-prove that Pavlov Classical Conditioning relaxes into Hebbian learning link to paper


LECTURE  EIGHT:  19 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI

Lecture 08, material: hand written notes A (inverse problems),
                                    hand written notes B synaptic dynamics),
                                    video lecture and slides of the talk@Riken 
Inverse Problems, inverse CW-model, RBM, Contrastive Divergence, Hebbian storing vs Hebbian learning.



LECTURE  NINE:  26 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 09, material: once understood how statistical learning and pattern retrieval take place, plan of this lecture is to show generalized modern architectures for neural networks and discuss which kind of generalized tasks they can accomplish.

Generalization of the Hopfield paradigm: moving from shallow to dense networks.
Dense Hebbian Networks: link2paper1  link2paper2 
Exponential storage model: link2paper

Generalization of the Hopfield paradigm: moving from autoassociative to heteroassociative networks.
BAM (Bidirectional Associative Memory: the first generalization of the Hopfield's autoassociative paradigm): link2paper
TAM (Threedirectional Associative Memory: the first generalization from pattern recognition to disentanglement): link2paper1  link2paper2


LECTURE  TEN:  03 MAR 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 10, material: the final lecture aims to be a broad chat on applications of AI, in particular in health-care, where it is expected to heavily impact the state of the art.  I will summarize the experience I collected working in the Labs in the past ten years.
I've merged the main results I want  to share in this "immunological lecture":  link2slides

 




A plenary lecture by Giorgio Parisi held at Accademia dei Lincei on the course's themes  link to conference
Recommended textbooks:
A.C.C. Coolen, R. Kuhn, P. Sollich,  Theory of neural information processing systems, Oxford University Press (Amazon link)
D.J. Amit, Modeling brain function, Cambridge University Press (Amazon link)

Prerequisites: 
Nothing more than a master's degree in Computer Science, Engineering, Mathematics or Physics (the order is alphabetical).
A previous knowledge of stochastic processes, statistical inference and statistical mechanics certainly facilitates the use of the course.

Worries:
For any worries please write to the teacher at myname.mysurname[at]uniroma1.it
Topics are listed at the bottom. Topics written in this color will probably not be covered in this academic year.
Lezione Uno                     03/02/2025 ore 15-18
                                                               1) Erhenfest's model (statics and dynamics): the II PTD between macroscopic and microscopic.
                                                               2) Shannon entropy in the microcanonical, MacKay bounds and Shannon-MacMillan lotteries.
                                                               3) Liouville's theorem, Zermelo's criticism, the box counting problem and mixing.
                                                               4) Gibbs' principle and statistical reductionism: the importance of quadratic cost functions.
                                                               5) Fourier's equation: the continuum limit of the random walk and the Gaussian solution from the delta.
                                                               6) Jaynes' principle of maximum entropy: between statistical mechanics and statistical inference.
                                                               7) extensive vs intensive quantities: law of large numbers and central limit theorem.
                                                               8) the logistic map: the genesis of deterministic chaos in the uncertainty of Cauchy problems.​

Lezione Due                    05/02/2025 ore 15-18 
                                                                 1) Lyapunov functions, stability of dynamic systems and related basins of attraction.
                                                                 2) rudiments of statistical mechanics: energies, entropies, free energies.
                                                                 3) rudiments of stochastic processes: detailed balance, ergodicity, irreducibility and Markov theorem.
                                                                 4) dynamic minimization of free energy: the Boltzmann bound à la Amit.
                                                                 5) equivalence of Boltzmann entropy with Gibbs and Shannon entropy in the canonical form.
                                                                 6) the biological neuron: from the sodium-potassium pump to Stein's "integrate & fire" model.
                                                                 7) Rosenblatt's perceptron and the criticism of Minsky & Papert: from the subject to interactions.​

Lezione Tre                             
                                                                 1) one-body models: factorized structure of Gibbs distributions and sigmoidal response m(h). 
                                                                 2) the Curie-Weiss model (direct account): ergodicity and symmetry breaking, phase transitions.
                                                                 3) Gaussian theory: Guerra interpolation for mean (magnetization) and variance (susceptibility).
                                                                 4) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
                                                                 5) Gaussian integral and solution of the Curie-Weiss model by means of the saddle point.
                                                                 6) structural analogy between response in ferromagnets and response in operational amplifiers.

Lezione Quattro                       
                                                                1) the neural network as a spin glass model: first generalities on spin glasses and neural networks.
                                                                2) neural dynamics as a Markov process. Steady state and description à la Boltzmann.
                                                                3) the Hebb learning rule and Hopfield's proposal for associative memory: AGS theory.
                                                                4) the Hopfield model in low load by means of the log-constrained entropy à la Coolen.
                                                                5) the Hopfield model in low load by means of the Guerra interpolation.
                                                                6) the Hopfield model in low load by means of the Hamilton-Jacobi technique.

Lezione Cinque            
                                                                1) the Sherrington-Kirkpatrick model: mean-field spin glasses, further generalities.
                                                                2) self-averaging and the replica symmetric description of the SK with the replica trick method.
                                                                3) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: means.
                                                                4) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: variances.
                                                                5) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
                                                                6) the Parisi replica symmetry breaking (RSB). The importance of the RSB in neural networks.
                                                                7) a nod to dynamics: aging (FDT-violation) and importance of the description via trap models.


Lezione Sei                                   
                                                                   1) the Hopfield neural network in high load by trick replication: symmetric replication solution.
                                                                   2) the Hopfield neural network in high load by Guerra interpolation: RS solution.
                                                                   3) a different perspective: Gardner's theory and Kohonen's bound
                                                                   4) variations on theme 1: "multi-tasking" neural networks and multiple parallel recall.
                                                                   5) variations on theme 2: "dreaming" neural networks and Kanter-Sompolinsky maximal storage.
                                                                   6) variations on theme 3: Kosko's hetero-associative neural networks and signal disentanglement.
                                                                   7) variations on theme 4: Jerne-Varela's idiotypical networks and the self-nonself distinction.
                                                                   8) the diluted and asymmetric networks of Derridà, Gardner and Zippelius

                                                                   9) the Curie-Weiss limit and the Sherrington-Kirkpatrick limit
​
Lezione Sette                             
                                                               1) the maximum likelihood method and estimators. Statistical inference and synaptic dynamics.
                                                               2) inverse problems: mean (magnetization) and variance (susceptibility) estimation in the Curie-Weiss.
                                                               3) the Pavlovian conditioned reflex module: learning by means of two multiscale ODEs.
                                                               4) the generalized conditioned reflex module: the AGS limit for long times.
                                                               5) the Pavlovian conditioned reflex module: persistent recalls and the genesis of obsessions.
                                                               6) the Kullback-Leibler cross entropy and the mutual information.

​​
Lezione Otto                                
                                                                 1) The Boltzmann machine: Hinton's statistical theory for synaptic dynamics.
                                                                 2) Supervised learning (or with teacher): the "contrastive divergence" technique.
                                                                 3) Unsupervised learning and related conceptual problems.
                                                                 4) Equivalence between Hopfield neural network and Hinton neural network: learning & retrieval.
                                                                 5) a look at the datasets (random, mnist/fashion-mnist, cifar-10), features & grandmother cells.
                                                                 6) Generalized equivalence for heteroassociative networks: clusters of interacting grandmother cells.

​
Lezione Nove                          
                                                                 1) a simple technique for a coarse but effective analysis: signal-2-noise (S2N).
                                                                 2) Hebbian learning from examples without the teacher via S2N: scaling for generalization.
                                                                 3) Hebbian learning from examples with the teacher via S2N: scaling for generalization.
                                                                 4) Hebbian learning in networks equipped with the ability to sleep via S2N: small datasets.
                                                                 5) Hebbian learning from the maximum entropy principle: cost functions and loss functions.
                                                                 6) the maximum entropy principle forcing moments beyond mean and variance: dense networks.
                                                                 7) other variations on the theme: the K-sat and coloring.
                                                               
 8) other variations on the theme: the traveling salesman problem à la Hopfield & Tank

​

Lezione Dieci                         
                                                                 1) Dense neural networks: learning, storage & retrieval in symmetric replica painting.
                                                                 2) Dense neural networks: lowering the retrieval signal/noise threshold by sacrificing storage.
                                                                 3) Dense neural networks: learning, storage & retrieval in broken replica symmetry painting.
                                                                 4) Exponential models and modern storage problems: latent heat and other loopholes.
                                                                 5) Deep Boltzmann machines: dense frames vs deep frames. 
                                                               
 6) Place cell experiments: the high connectivity of the hippocampus.
                                                                 7) Bialek experiments: the patch clamp and the multielectrode array at maximum entropy.

​           ​
SYNOPSIS

The course is divided into three main sections:

1) The first section serves to ensure that we share a basic scientific knowledge (obviously a necessary prerequisite to take the first steps together towards a formal mathematical framework for neural networks). In short, the student will be provided with rudiments of Statistical Mechanics, Stochastic Processes and Statistical Inference by revisiting together some fundamental topics (adapted for this course) of canonical relevance of these disciplines.
​
2) The second section instead introduces mathematical methods and models aimed at quantitatively characterizing simple and complex systems in statistical mechanics, which will be fundamental for a subsequent mathematical analysis of the functioning of neural networks from a theoretical and systemic perspective. In this section we develop in detail the mathematical methods necessary to describe and understand the phenomenology that these systems exhibit (from spontaneous ergodicity breaking to spontaneous replica symmetry breaking) by providing ourselves with both very effective yet heuristic methods (widely used in Theoretical Physics approaches à la Parisi, e.g. “replica trick”, “message passage”, etc.), and more rigorous ones (the prerogative of the know-how of Mathematical Physics à la Guerra, e.g. “stochastic stability”, “cavity fields”, etc.).

3) The last and most important section is instead completely dedicated to neural networks and follows the main path traced by Amit, Gutfreund & Sompolinsky: after a minimal description (always in mathematical terms) of the key mechanisms of spike emission in biological neuron models -e.g. Stein's integrate&fire (as well as their electronic implementation, e.g. Rosenblatt's perceptron) - and the propagation of information between them through axons and dendrites, we will see the limits of single-neuron computation by looking at it from different perspectives (e.g. Minsky & Papert's criticism in the construction of logic gates as i.e. the XOR, etc.). Having shifted the focus from the subject (the neuron) to its interactions (neural networks), we will then build  neural networks  and study their information processing emergent properties (namely those not immediately deducible solely by looking at the behavior of the single neuron), persisting in a statistical mechanics perspective. Specifically, we will try to see how these networks are able to learn and abstract "archetypes" by looking at examples supplied by the external world. Subsequently, we will show how these networks use what they have learned to respond appropriately, when stimulated, to the external world by carrying out tasks such as "pattern recognition", "associative memory", "pattern disentanglement", etc.
We will also understand how these processes can sometimes go wrong, and why.
Using Hopfield & Hinton's neural networks as leitmotif for several variations on this theme, the section will close by showing the deep structural and computational equivalence between these two theories, Hopfield's pattern recognition and Hinton's statistical learning, unifying these pillars of the discipline in a single and coherent scenario for the whole phenomenon of "cognition": ideally - and hopefully - at the end of the course the student should be able to independently continue in the study of these topics. In particular, the student should be able, interacting in a team in the future, to play a complementary role to the figures of the computer scientist and the information engineer, taking an interest in their same topics, but offering a different perspective, intrinsically more abstract and synthetic (that is, where the myriad of algorithmic recipes that we produce every day find a natural placement) helping in the optimization of a research group itself.



​
OBJECTIVES

The aim of the course is to share with the student the salient concepts and, at the same time, to provide the student with the key tools, so that he can autonomously continue his cultural growth in the field of neural networks and machine learning from a purely modelling perspective: this course aims to be a theoretical course of "statistical mechanics of neural networks and machine learning", not a computational one.
The ultimate ambition is to be able to ask questions (cum grano salis) about the first principles of AI functioning (drawing inspiration from analogies with information processing in biological networks) and, where possible, to answer, understand how to set a problem about AI within a suitable mathematical framework so that these neural networks are absolutely not seen at all as "black boxes".

As a technical note, it is emphasized that the course is structured in a "methodologically symmetric" way in this sense: we will deal with all the main models (i.e. Curie-Weiss, Sherrington-Kirkpatrick, Hopfield, Boltzmann) using the same techniques (the saddle-point method/replica trick, the Guerra-style interpolation technique and the approaches using partial differential equations) from time to time appropriately shaped on the model under study: this should help the student to become familiar with the techniques themselves - as well as with the particular neural network under examination - in order to make him autonomous in the study of new neural architectures and/or data-learning algorithms. 
Proudly powered by Weebly