Neural Networks & Learning Machines
Note: the course will occupy a small time slot (just over a month) and lessons will take up a total of 30 hours for a value of 4 CFU.
|
Advanced Course for the PhDFollowing a historical perspective, the course analyzes mathematical models and methods related to the spontaneous information processing capabilities shown by networks of neurons (biological or artificial, once suitably stylized). After summarizing key concepts from statistical mechanics, stochastic processes and statistical inference, the course starts analyzing main models for the emission of an electrical signal by a single neuron. Then we will study how these interact in simple neural architectures, analyzing both the statistical learning capabilities that these networks enjoy as well as their retrieval skills. In particular, due to the Nobel Prize in Physics awarded in 2024 to John Hopfield and Geoffrey Hinton for their pioneering studies on neural networks, particular emphasis will be placed on their contributions and on the close connection that exists between them. The methodological leitmotif will be the statistical mechanics of complex systems (i.e. Parisi's theory, Nobel Prize in Physics in 2021) with its associated package of observables and typical tools (replicas, overlaps, etc.).
|
MATERIAL PRODUCED DURING THE COURSE
LECTURE ONE: 03 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 01, first part: video
Lecture 01, second part: video
Lecture 01, material: a) blackboard notes, b) hand-written notes c) slides for the intro
Arguments of Lecture One: a broad historical introduction (AI in the last 100 years) and what is actually happening now.
entropy: a physical perspective (entropy from Boltzmann to Gibbs).
entropy: a mathematical perspective (entropy from Shannon to Jaynes).
the Erhenfest (urn) model: statics and dynamics, study of the entropy in both the scenarios.
constrained Gibbs entropy by quadratic cost functions: physicist's free energy and reductionism.
constrained Gibbs entropy by Lagrange multipliers on mean and variance: the Gaussian world.
LECTURE TWO: 07 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 02, first part: video
Lecture 02, second part: video
Lecture 02, material a) blackboard notes, b) hand-written notes
Note: the example shared at the end regarding deterministic chaos does not belong to the program and you can deepen it here.
Arguments of Lecture One: temperature in Physics = noise in Machine Learning. Fourier equation as the limit of random walk.
Shanonn-McMillan sequences: e.g. [+1,+1,....+1,+1] vs [+1, -1, ... , +1, -1] entropy of sequences.
free energy in Physics: a bridge between microscopic description and the macroscopic observables
free energy in Mathematics: searching the minima of the cost function under the Liebnitz principle
One-body models: response function, factorized structure of the Gibbs weight.
Two-mody models: the Curie-Weiss model and its heuristic derivation.
LECTURE FOUR: 10 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 03, summary: this has been a technical lecture on mathematical approaches alternative to heuristic methodologies to solve for the free energy of given cost functions. We saw two techniques: Guerra interpolation (for the free energy and both its quantifiers, the first momentum -the magnetization- and the second momentum -the susceptibility) and Hamilton-Jacobi PDE (free energy in statistical mechanics = action in analytical mechanics: it obeys a HJ PDE and can be solved by that technique. The magnetization obeys a Burgers equation that collapses on the Riemann-Hopf PDE proving that symmetry breaking in Theor. Phys jargon is Hop bifurcation in Math. Phys. jargon.
LECTURE TWO: 05 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 04, first part:
Lecture 04, second part:
Lecture 04, material:
LECTURE ONE: 03 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 01, first part: video
Lecture 01, second part: video
Lecture 01, material: a) blackboard notes, b) hand-written notes c) slides for the intro
Arguments of Lecture One: a broad historical introduction (AI in the last 100 years) and what is actually happening now.
entropy: a physical perspective (entropy from Boltzmann to Gibbs).
entropy: a mathematical perspective (entropy from Shannon to Jaynes).
the Erhenfest (urn) model: statics and dynamics, study of the entropy in both the scenarios.
constrained Gibbs entropy by quadratic cost functions: physicist's free energy and reductionism.
constrained Gibbs entropy by Lagrange multipliers on mean and variance: the Gaussian world.
LECTURE TWO: 07 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 02, first part: video
Lecture 02, second part: video
Lecture 02, material a) blackboard notes, b) hand-written notes
Note: the example shared at the end regarding deterministic chaos does not belong to the program and you can deepen it here.
Arguments of Lecture One: temperature in Physics = noise in Machine Learning. Fourier equation as the limit of random walk.
Shanonn-McMillan sequences: e.g. [+1,+1,....+1,+1] vs [+1, -1, ... , +1, -1] entropy of sequences.
free energy in Physics: a bridge between microscopic description and the macroscopic observables
free energy in Mathematics: searching the minima of the cost function under the Liebnitz principle
One-body models: response function, factorized structure of the Gibbs weight.
Two-mody models: the Curie-Weiss model and its heuristic derivation.
LECTURE FOUR: 10 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 03, summary: this has been a technical lecture on mathematical approaches alternative to heuristic methodologies to solve for the free energy of given cost functions. We saw two techniques: Guerra interpolation (for the free energy and both its quantifiers, the first momentum -the magnetization- and the second momentum -the susceptibility) and Hamilton-Jacobi PDE (free energy in statistical mechanics = action in analytical mechanics: it obeys a HJ PDE and can be solved by that technique. The magnetization obeys a Burgers equation that collapses on the Riemann-Hopf PDE proving that symmetry breaking in Theor. Phys jargon is Hop bifurcation in Math. Phys. jargon.
LECTURE TWO: 05 FEB 2025 from 15:00 to 18:00 room 1b1, SBAI
Lecture 04, first part:
Lecture 04, second part:
Lecture 04, material:
Recommended textbooks:
A.C.C. Coolen, R. Kuhn, P. Sollich, Theory of neural information processing systems, Oxford University Press (Amazon link)
D.J. Amit, Modeling brain function, Cambridge University Press (Amazon link)
Prerequisites:
Nothing more than a master's degree in Computer Science, Engineering, Mathematics or Physics (the order is alphabetical).
A previous knowledge of stochastic processes, statistical inference and statistical mechanics certainly facilitates the use of the course.
Worries:
For any worries please write to the teacher at myname.mysurname[at]uniroma1.it
Topics are listed at the bottom. Topics written in this color will probably not be covered in this academic year.
A.C.C. Coolen, R. Kuhn, P. Sollich, Theory of neural information processing systems, Oxford University Press (Amazon link)
D.J. Amit, Modeling brain function, Cambridge University Press (Amazon link)
Prerequisites:
Nothing more than a master's degree in Computer Science, Engineering, Mathematics or Physics (the order is alphabetical).
A previous knowledge of stochastic processes, statistical inference and statistical mechanics certainly facilitates the use of the course.
Worries:
For any worries please write to the teacher at myname.mysurname[at]uniroma1.it
Topics are listed at the bottom. Topics written in this color will probably not be covered in this academic year.
Lezione Uno 03/02/2025 ore 15-18
1) Erhenfest's model (statics and dynamics): the II PTD between macroscopic and microscopic.
2) Shannon entropy in the microcanonical, MacKay bounds and Shannon-MacMillan lotteries.
3) Liouville's theorem, Zermelo's criticism, the box counting problem and mixing.
4) Gibbs' principle and statistical reductionism: the importance of quadratic cost functions.
5) Fourier's equation: the continuum limit of the random walk and the Gaussian solution from the delta.
6) Jaynes' principle of maximum entropy: between statistical mechanics and statistical inference.
7) extensive vs intensive quantities: law of large numbers and central limit theorem.
8) the logistic map: the genesis of deterministic chaos in the uncertainty of Cauchy problems.
Lezione Due 05/02/2025 ore 15-18
1) Lyapunov functions, stability of dynamic systems and related basins of attraction.
2) rudiments of statistical mechanics: energies, entropies, free energies.
3) rudiments of stochastic processes: detailed balance, ergodicity, irreducibility and Markov theorem.
4) dynamic minimization of free energy: the Boltzmann bound à la Amit.
5) equivalence of Boltzmann entropy with Gibbs and Shannon entropy in the canonical form.
6) the biological neuron: from the sodium-potassium pump to Stein's "integrate & fire" model.
7) Rosenblatt's perceptron and the criticism of Minsky & Papert: from the subject to interactions.
Lezione Tre
1) one-body models: factorized structure of Gibbs distributions and sigmoidal response m(h).
2) the Curie-Weiss model (direct account): ergodicity and symmetry breaking, phase transitions.
3) Gaussian theory: Guerra interpolation for mean (magnetization) and variance (susceptibility).
4) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
5) Gaussian integral and solution of the Curie-Weiss model by means of the saddle point.
6) structural analogy between response in ferromagnets and response in operational amplifiers.
Lezione Quattro
1) the neural network as a spin glass model: first generalities on spin glasses and neural networks.
2) neural dynamics as a Markov process. Steady state and description à la Boltzmann.
3) the Hebb learning rule and Hopfield's proposal for associative memory: AGS theory.
4) the Hopfield model in low load by means of the log-constrained entropy à la Coolen.
5) the Hopfield model in low load by means of the Guerra interpolation.
6) the Hopfield model in low load by means of the Hamilton-Jacobi technique.
Lezione Cinque
1) the Sherrington-Kirkpatrick model: mean-field spin glasses, further generalities.
2) self-averaging and the replica symmetric description of the SK with the replica trick method.
3) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: means.
4) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: variances.
5) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
6) the Parisi replica symmetry breaking (RSB). The importance of the RSB in neural networks.
7) a nod to dynamics: aging (FDT-violation) and importance of the description via trap models.
Lezione Sei
1) the Hopfield neural network in high load by trick replication: symmetric replication solution.
2) the Hopfield neural network in high load by Guerra interpolation: RS solution.
3) a different perspective: Gardner's theory and Kohonen's bound
4) variations on theme 1: "multi-tasking" neural networks and multiple parallel recall.
5) variations on theme 2: "dreaming" neural networks and Kanter-Sompolinsky maximal storage.
6) variations on theme 3: Kosko's hetero-associative neural networks and signal disentanglement.
7) variations on theme 4: Jerne-Varela's idiotypical networks and the self-nonself distinction.
8) the diluted and asymmetric networks of Derridà, Gardner and Zippelius
9) the Curie-Weiss limit and the Sherrington-Kirkpatrick limit
Lezione Sette
1) the maximum likelihood method and estimators. Statistical inference and synaptic dynamics.
2) inverse problems: mean (magnetization) and variance (susceptibility) estimation in the Curie-Weiss.
3) the Pavlovian conditioned reflex module: learning by means of two multiscale ODEs.
4) the generalized conditioned reflex module: the AGS limit for long times.
5) the Pavlovian conditioned reflex module: persistent recalls and the genesis of obsessions.
6) the Kullback-Leibler cross entropy and the mutual information.
Lezione Otto
1) The Boltzmann machine: Hinton's statistical theory for synaptic dynamics.
2) Supervised learning (or with teacher): the "contrastive divergence" technique.
3) Unsupervised learning and related conceptual problems.
4) Equivalence between Hopfield neural network and Hinton neural network: learning & retrieval.
5) a look at the datasets (random, mnist/fashion-mnist, cifar-10), features & grandmother cells.
6) Generalized equivalence for heteroassociative networks: clusters of interacting grandmother cells.
Lezione Nove
1) a simple technique for a coarse but effective analysis: signal-2-noise (S2N).
2) Hebbian learning from examples without the teacher via S2N: scaling for generalization.
3) Hebbian learning from examples with the teacher via S2N: scaling for generalization.
4) Hebbian learning in networks equipped with the ability to sleep via S2N: small datasets.
5) Hebbian learning from the maximum entropy principle: cost functions and loss functions.
6) the maximum entropy principle forcing moments beyond mean and variance: dense networks.
7) other variations on the theme: the K-sat and coloring.
8) other variations on the theme: the traveling salesman problem à la Hopfield & Tank
Lezione Dieci
1) Dense neural networks: learning, storage & retrieval in symmetric replica painting.
2) Dense neural networks: lowering the retrieval signal/noise threshold by sacrificing storage.
3) Dense neural networks: learning, storage & retrieval in broken replica symmetry painting.
4) Exponential models and modern storage problems: latent heat and other loopholes.
5) Deep Boltzmann machines: dense frames vs deep frames.
6) Place cell experiments: the high connectivity of the hippocampus.
7) Bialek experiments: the patch clamp and the multielectrode array at maximum entropy.
1) Erhenfest's model (statics and dynamics): the II PTD between macroscopic and microscopic.
2) Shannon entropy in the microcanonical, MacKay bounds and Shannon-MacMillan lotteries.
3) Liouville's theorem, Zermelo's criticism, the box counting problem and mixing.
4) Gibbs' principle and statistical reductionism: the importance of quadratic cost functions.
5) Fourier's equation: the continuum limit of the random walk and the Gaussian solution from the delta.
6) Jaynes' principle of maximum entropy: between statistical mechanics and statistical inference.
7) extensive vs intensive quantities: law of large numbers and central limit theorem.
8) the logistic map: the genesis of deterministic chaos in the uncertainty of Cauchy problems.
Lezione Due 05/02/2025 ore 15-18
1) Lyapunov functions, stability of dynamic systems and related basins of attraction.
2) rudiments of statistical mechanics: energies, entropies, free energies.
3) rudiments of stochastic processes: detailed balance, ergodicity, irreducibility and Markov theorem.
4) dynamic minimization of free energy: the Boltzmann bound à la Amit.
5) equivalence of Boltzmann entropy with Gibbs and Shannon entropy in the canonical form.
6) the biological neuron: from the sodium-potassium pump to Stein's "integrate & fire" model.
7) Rosenblatt's perceptron and the criticism of Minsky & Papert: from the subject to interactions.
Lezione Tre
1) one-body models: factorized structure of Gibbs distributions and sigmoidal response m(h).
2) the Curie-Weiss model (direct account): ergodicity and symmetry breaking, phase transitions.
3) Gaussian theory: Guerra interpolation for mean (magnetization) and variance (susceptibility).
4) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
5) Gaussian integral and solution of the Curie-Weiss model by means of the saddle point.
6) structural analogy between response in ferromagnets and response in operational amplifiers.
Lezione Quattro
1) the neural network as a spin glass model: first generalities on spin glasses and neural networks.
2) neural dynamics as a Markov process. Steady state and description à la Boltzmann.
3) the Hebb learning rule and Hopfield's proposal for associative memory: AGS theory.
4) the Hopfield model in low load by means of the log-constrained entropy à la Coolen.
5) the Hopfield model in low load by means of the Guerra interpolation.
6) the Hopfield model in low load by means of the Hamilton-Jacobi technique.
Lezione Cinque
1) the Sherrington-Kirkpatrick model: mean-field spin glasses, further generalities.
2) self-averaging and the replica symmetric description of the SK with the replica trick method.
3) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: means.
4) Gaussian theory: the replica symmetric Guerra interpolation for the SK model: variances.
5) Lagrangian theory: Hamilton-Jacobi and Burgers equation. Hopf transitions & bifurcations.
6) the Parisi replica symmetry breaking (RSB). The importance of the RSB in neural networks.
7) a nod to dynamics: aging (FDT-violation) and importance of the description via trap models.
Lezione Sei
1) the Hopfield neural network in high load by trick replication: symmetric replication solution.
2) the Hopfield neural network in high load by Guerra interpolation: RS solution.
3) a different perspective: Gardner's theory and Kohonen's bound
4) variations on theme 1: "multi-tasking" neural networks and multiple parallel recall.
5) variations on theme 2: "dreaming" neural networks and Kanter-Sompolinsky maximal storage.
6) variations on theme 3: Kosko's hetero-associative neural networks and signal disentanglement.
7) variations on theme 4: Jerne-Varela's idiotypical networks and the self-nonself distinction.
8) the diluted and asymmetric networks of Derridà, Gardner and Zippelius
9) the Curie-Weiss limit and the Sherrington-Kirkpatrick limit
Lezione Sette
1) the maximum likelihood method and estimators. Statistical inference and synaptic dynamics.
2) inverse problems: mean (magnetization) and variance (susceptibility) estimation in the Curie-Weiss.
3) the Pavlovian conditioned reflex module: learning by means of two multiscale ODEs.
4) the generalized conditioned reflex module: the AGS limit for long times.
5) the Pavlovian conditioned reflex module: persistent recalls and the genesis of obsessions.
6) the Kullback-Leibler cross entropy and the mutual information.
Lezione Otto
1) The Boltzmann machine: Hinton's statistical theory for synaptic dynamics.
2) Supervised learning (or with teacher): the "contrastive divergence" technique.
3) Unsupervised learning and related conceptual problems.
4) Equivalence between Hopfield neural network and Hinton neural network: learning & retrieval.
5) a look at the datasets (random, mnist/fashion-mnist, cifar-10), features & grandmother cells.
6) Generalized equivalence for heteroassociative networks: clusters of interacting grandmother cells.
Lezione Nove
1) a simple technique for a coarse but effective analysis: signal-2-noise (S2N).
2) Hebbian learning from examples without the teacher via S2N: scaling for generalization.
3) Hebbian learning from examples with the teacher via S2N: scaling for generalization.
4) Hebbian learning in networks equipped with the ability to sleep via S2N: small datasets.
5) Hebbian learning from the maximum entropy principle: cost functions and loss functions.
6) the maximum entropy principle forcing moments beyond mean and variance: dense networks.
7) other variations on the theme: the K-sat and coloring.
8) other variations on the theme: the traveling salesman problem à la Hopfield & Tank
Lezione Dieci
1) Dense neural networks: learning, storage & retrieval in symmetric replica painting.
2) Dense neural networks: lowering the retrieval signal/noise threshold by sacrificing storage.
3) Dense neural networks: learning, storage & retrieval in broken replica symmetry painting.
4) Exponential models and modern storage problems: latent heat and other loopholes.
5) Deep Boltzmann machines: dense frames vs deep frames.
6) Place cell experiments: the high connectivity of the hippocampus.
7) Bialek experiments: the patch clamp and the multielectrode array at maximum entropy.
SYNOPSIS
The course is divided into three main sections:
1) The first section serves to ensure that we share a basic scientific knowledge (obviously a necessary prerequisite to take the first steps together towards a formal mathematical framework for neural networks). In short, the student will be provided with rudiments of Statistical Mechanics, Stochastic Processes and Statistical Inference by revisiting together some fundamental topics (adapted for this course) of canonical relevance of these disciplines.
2) The second section instead introduces mathematical methods and models aimed at quantitatively characterizing simple and complex systems in statistical mechanics, which will be fundamental for a subsequent mathematical analysis of the functioning of neural networks from a theoretical and systemic perspective. In this section we develop in detail the mathematical methods necessary to describe and understand the phenomenology that these systems exhibit (from spontaneous ergodicity breaking to spontaneous replica symmetry breaking) by providing ourselves with both very effective yet heuristic methods (widely used in Theoretical Physics approaches à la Parisi, e.g. “replica trick”, “message passage”, etc.), and more rigorous ones (the prerogative of the know-how of Mathematical Physics à la Guerra, e.g. “stochastic stability”, “cavity fields”, etc.).
3) The last and most important section is instead completely dedicated to neural networks and follows the main path traced by Amit, Gutfreund & Sompolinsky: after a minimal description (always in mathematical terms) of the key mechanisms of spike emission in biological neuron models -e.g. Stein's integrate&fire (as well as their electronic implementation, e.g. Rosenblatt's perceptron) - and the propagation of information between them through axons and dendrites, we will see the limits of single-neuron computation by looking at it from different perspectives (e.g. Minsky & Papert's criticism in the construction of logic gates as i.e. the XOR, etc.). Having shifted the focus from the subject (the neuron) to its interactions (neural networks), we will then build neural networks and study their information processing emergent properties (namely those not immediately deducible solely by looking at the behavior of the single neuron), persisting in a statistical mechanics perspective. Specifically, we will try to see how these networks are able to learn and abstract "archetypes" by looking at examples supplied by the external world. Subsequently, we will show how these networks use what they have learned to respond appropriately, when stimulated, to the external world by carrying out tasks such as "pattern recognition", "associative memory", "pattern disentanglement", etc.
We will also understand how these processes can sometimes go wrong, and why.
Using Hopfield & Hinton's neural networks as leitmotif for several variations on this theme, the section will close by showing the deep structural and computational equivalence between these two theories, Hopfield's pattern recognition and Hinton's statistical learning, unifying these pillars of the discipline in a single and coherent scenario for the whole phenomenon of "cognition": ideally - and hopefully - at the end of the course the student should be able to independently continue in the study of these topics. In particular, the student should be able, interacting in a team in the future, to play a complementary role to the figures of the computer scientist and the information engineer, taking an interest in their same topics, but offering a different perspective, intrinsically more abstract and synthetic (that is, where the myriad of algorithmic recipes that we produce every day find a natural placement) helping in the optimization of a research group itself.
OBJECTIVES
The aim of the course is to share with the student the salient concepts and, at the same time, to provide the student with the key tools, so that he can autonomously continue his cultural growth in the field of neural networks and machine learning from a purely modelling perspective: this course aims to be a theoretical course of "statistical mechanics of neural networks and machine learning", not a computational one.
The ultimate ambition is to be able to ask questions (cum grano salis) about the first principles of AI functioning (drawing inspiration from analogies with information processing in biological networks) and, where possible, to answer, understand how to set a problem about AI within a suitable mathematical framework so that these neural networks are absolutely not seen at all as "black boxes".
As a technical note, it is emphasized that the course is structured in a "methodologically symmetric" way in this sense: we will deal with all the main models (i.e. Curie-Weiss, Sherrington-Kirkpatrick, Hopfield, Boltzmann) using the same techniques (the saddle-point method/replica trick, the Guerra-style interpolation technique and the approaches using partial differential equations) from time to time appropriately shaped on the model under study: this should help the student to become familiar with the techniques themselves - as well as with the particular neural network under examination - in order to make him autonomous in the study of new neural architectures and/or data-learning algorithms.
The course is divided into three main sections:
1) The first section serves to ensure that we share a basic scientific knowledge (obviously a necessary prerequisite to take the first steps together towards a formal mathematical framework for neural networks). In short, the student will be provided with rudiments of Statistical Mechanics, Stochastic Processes and Statistical Inference by revisiting together some fundamental topics (adapted for this course) of canonical relevance of these disciplines.
2) The second section instead introduces mathematical methods and models aimed at quantitatively characterizing simple and complex systems in statistical mechanics, which will be fundamental for a subsequent mathematical analysis of the functioning of neural networks from a theoretical and systemic perspective. In this section we develop in detail the mathematical methods necessary to describe and understand the phenomenology that these systems exhibit (from spontaneous ergodicity breaking to spontaneous replica symmetry breaking) by providing ourselves with both very effective yet heuristic methods (widely used in Theoretical Physics approaches à la Parisi, e.g. “replica trick”, “message passage”, etc.), and more rigorous ones (the prerogative of the know-how of Mathematical Physics à la Guerra, e.g. “stochastic stability”, “cavity fields”, etc.).
3) The last and most important section is instead completely dedicated to neural networks and follows the main path traced by Amit, Gutfreund & Sompolinsky: after a minimal description (always in mathematical terms) of the key mechanisms of spike emission in biological neuron models -e.g. Stein's integrate&fire (as well as their electronic implementation, e.g. Rosenblatt's perceptron) - and the propagation of information between them through axons and dendrites, we will see the limits of single-neuron computation by looking at it from different perspectives (e.g. Minsky & Papert's criticism in the construction of logic gates as i.e. the XOR, etc.). Having shifted the focus from the subject (the neuron) to its interactions (neural networks), we will then build neural networks and study their information processing emergent properties (namely those not immediately deducible solely by looking at the behavior of the single neuron), persisting in a statistical mechanics perspective. Specifically, we will try to see how these networks are able to learn and abstract "archetypes" by looking at examples supplied by the external world. Subsequently, we will show how these networks use what they have learned to respond appropriately, when stimulated, to the external world by carrying out tasks such as "pattern recognition", "associative memory", "pattern disentanglement", etc.
We will also understand how these processes can sometimes go wrong, and why.
Using Hopfield & Hinton's neural networks as leitmotif for several variations on this theme, the section will close by showing the deep structural and computational equivalence between these two theories, Hopfield's pattern recognition and Hinton's statistical learning, unifying these pillars of the discipline in a single and coherent scenario for the whole phenomenon of "cognition": ideally - and hopefully - at the end of the course the student should be able to independently continue in the study of these topics. In particular, the student should be able, interacting in a team in the future, to play a complementary role to the figures of the computer scientist and the information engineer, taking an interest in their same topics, but offering a different perspective, intrinsically more abstract and synthetic (that is, where the myriad of algorithmic recipes that we produce every day find a natural placement) helping in the optimization of a research group itself.
OBJECTIVES
The aim of the course is to share with the student the salient concepts and, at the same time, to provide the student with the key tools, so that he can autonomously continue his cultural growth in the field of neural networks and machine learning from a purely modelling perspective: this course aims to be a theoretical course of "statistical mechanics of neural networks and machine learning", not a computational one.
The ultimate ambition is to be able to ask questions (cum grano salis) about the first principles of AI functioning (drawing inspiration from analogies with information processing in biological networks) and, where possible, to answer, understand how to set a problem about AI within a suitable mathematical framework so that these neural networks are absolutely not seen at all as "black boxes".
As a technical note, it is emphasized that the course is structured in a "methodologically symmetric" way in this sense: we will deal with all the main models (i.e. Curie-Weiss, Sherrington-Kirkpatrick, Hopfield, Boltzmann) using the same techniques (the saddle-point method/replica trick, the Guerra-style interpolation technique and the approaches using partial differential equations) from time to time appropriately shaped on the model under study: this should help the student to become familiar with the techniques themselves - as well as with the particular neural network under examination - in order to make him autonomous in the study of new neural architectures and/or data-learning algorithms.