10% Discount with Use Code SAVEON10
  • Cart
  • Contact us
  • FAQ
logo7 edebook
Login / Register
Wishlist
0 Compare
14 items $216.11
Menu
logo7 edebook
14 items $216.11
  • Home
  • Shop
  • My account
  • Blog
  • About us
  • Contact us
  • Request an eBook
“Software Design for Flexibility: How to Avoid Programming Yourself into a Corner by Chris Hanson, ISBN-13: 978-0262045490” has been added to your cart. View cart
The Principles of Deep Learning Theory Daniel A. Roberts, ISBN-13: 978-1316519332
Home Computing The Principles of Deep Learning Theory Daniel A. Roberts, ISBN-13: 978-1316519332
The Singularity Is Near: When Humans Transcend Biology Ray Kurzweil, ISBN-13: 978-0670033843
The Singularity Is Near: When Humans Transcend Biology Ray Kurzweil, ISBN-13: 978-0670033843 $50.00 Original price was: $50.00.$8.74Current price is: $8.74.
Back to products
The Practice of System and Network Administration Volume 1 3rd Edition Thomas Limoncelli, ISBN-10: 9780321919168
The Practice of System and Network Administration Volume 1 3rd Edition Thomas Limoncelli, ISBN-10: 9780321919168 $50.00 Original price was: $50.00.$12.35Current price is: $12.35.

The Principles of Deep Learning Theory Daniel A. Roberts, ISBN-13: 978-1316519332

Rated 5.00 out of 5 based on 1 customer rating
(1 customer review)

$50.00 Original price was: $50.00.$17.36Current price is: $17.36.

Compare
Add to wishlist
SKU: the-principles-of-deep-learning-theory-by-daniel-a-roberts-isbn-13-978-1316519332 Category: Computing
Share:
  • Description
  • Reviews (1)
  • Shipping & Delivery
Description

The Principles of Deep Learning Theory by Daniel A. Roberts, ISBN-13: 978-1316519332

[PDF eBook eTextbook]

  • Publisher: ‎ Cambridge University Press; New edition (May 26, 2022)
  • Language: ‎ English
  • 472 pages
  • ISBN-10: ‎ 1316519333
  • ISBN-13: ‎ 978-1316519332

This volume develops an effective theory approach to understanding deep neural networks of practical relevance.

This textbook establishes a theoretical framework for understanding deep learning models of practical relevance. With an approach that borrows from theoretical physics, Roberts and Yaida provide clear and pedagogical explanations of how realistic deep neural networks actually work. To make results from the theoretical forefront accessible, the authors eschew the subject’s traditional emphasis on intimidating formality without sacrificing accuracy. Straightforward and approachable, this volume balances detailed first-principle derivations of novel results with insight and intuition for theorists and practitioners alike. This self-contained textbook is ideal for students and researchers interested in artificial intelligence with minimal prerequisites of linear algebra, calculus, and informal probability theory, and it can easily fill a semester-long course on deep learning theory. For the first time, the exciting practical advances in modern artificial intelligence capabilities can be matched with a set of effective principles, providing a timeless blueprint for theoretical research in deep learning.

Table of Contents:

Preface ix
0 Initialization 1
0.1 An Effective Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 The TheoreticalMinimum . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Pretraining 11
1.1 Gaussian Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Probability, Correlation and Statistics, and All That . . . . . . . . . . . . 21
1.3 Nearly-Gaussian Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Neural Networks 37
2.1 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 Effective Theory of Deep Linear Networks at Initialization 53
3.1 Deep Linear Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 RG Flow of Preactivations 71
4.1 First Layer: Good-Old Gaussian . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Second Layer: Genesis of Non-Gaussianity . . . . . . . . . . . . . . . . . . 79
4.3 Deeper Layers: Accumulation of Non-Gaussianity . . . . . . . . . . . . . . 90
4.4 Marginalization Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Subleading Corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.6 RG Flow and RG Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 Effective Theory of Preactivations at Initialization 109
5.1 Criticality Analysis of the Kernel . . . . . . . . . . . . . . . . . . . . . . . 110
5.2 Criticality for Scale-Invariant Activations . . . . . . . . . . . . . . . . . . 123
5.3 Universality Beyond Scale-Invariant Activations . . . . . . . . . . . . . . . 125
v
Published online by Cambridge University Press
vi Contents
5.3.1 General Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.3.2 No Criticality: Sigmoid, Softplus, Nonlinear Monomials, etc. . . . . 128
5.3.3 K =0 Universality Class: tanh, sin, etc. . . . . . . . . . . . . . . 130
5.3.4 Half-Stable Universality Classes: SWISH, etc. and GELU, etc. . . 135
5.4 Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.4.1 Fluctuations for the Scale-Invariant Universality Class . . . . . . . 139
5.4.2 Fluctuations for the K =0 Universality Class . . . . . . . . . . . 141
5.5 Finite-Angle Analysis for the Scale-Invariant Universality Class . . . . . . 146
6 Bayesian Learning 153
6.1 Bayesian Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.2 Bayesian Inference and Neural Networks . . . . . . . . . . . . . . . . . . . 156
6.2.1 BayesianModel Fitting . . . . . . . . . . . . . . . . . . . . . . . . 157
6.2.2 BayesianModel Comparison . . . . . . . . . . . . . . . . . . . . . 165
6.3 Bayesian Inference at Infinite Width . . . . . . . . . . . . . . . . . . . . . 169
6.3.1 The Evidence for Criticality . . . . . . . . . . . . . . . . . . . . . . 169
6.3.2 Let’s NotWire Together . . . . . . . . . . . . . . . . . . . . . . . . 173
6.3.3 Absence of Representation Learning . . . . . . . . . . . . . . . . . 178
6.4 Bayesian Inference at Finite Width . . . . . . . . . . . . . . . . . . . . . . 179
6.4.1 Hebbian Learning, Inc. . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.4.2 Let’sWire Together . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.4.3 Presence of Representation Learning . . . . . . . . . . . . . . . . . 186
7 Gradient-Based Learning 191
7.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.2 Gradient Descent and Function Approximation . . . . . . . . . . . . . . . 194
8 RG Flow of the Neural Tangent Kernel 199
8.0 Forward Equation for the NTK . . . . . . . . . . . . . . . . . . . . . . . . 200
8.1 First Layer: Deterministic NTK . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2 Second Layer: Fluctuating NTK . . . . . . . . . . . . . . . . . . . . . . . 207
8.3 Deeper Layers: Accumulation of NTK Fluctuations . . . . . . . . . . . . . 211
8.3.0 Interlude: Interlayer Correlations . . . . . . . . . . . . . . . . . . . 211
8.3.1 NTKMean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.3.2 NTK–Preactivation Cross Correlations . . . . . . . . . . . . . . . . 216
8.3.3 NTK Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9 Effective Theory of the NTK at Initialization 227
9.1 Criticality Analysis of the NTK . . . . . . . . . . . . . . . . . . . . . . . . 228
9.2 Scale-Invariant Universality Class . . . . . . . . . . . . . . . . . . . . . . . 233
9.3 K =0 Universality Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
9.4 Criticality, Exploding and Vanishing Problems, and None of That . . . . . 241
Published online by Cambridge University Press
Contents vii
10 Kernel Learning 247
10.1 A Small Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.1.1 NoWiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.1.2 No Representation Learning . . . . . . . . . . . . . . . . . . . . . . 250
10.2 A Giant Leap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.2.1 Newton’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.2.2 AlgorithmIndependence . . . . . . . . . . . . . . . . . . . . . . . . 257
10.2.3 Aside: Cross-Entropy Loss . . . . . . . . . . . . . . . . . . . . . . . 259
10.2.4 Kernel Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.3 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.3.1 Bias–Variance Tradeoff and Criticality . . . . . . . . . . . . . . . . 267
10.3.2 Interpolation and Extrapolation . . . . . . . . . . . . . . . . . . . 277
10.4 Linear Models and Kernel Methods . . . . . . . . . . . . . . . . . . . . . . 282
10.4.1 LinearModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
10.4.2 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
10.4.3 Infinite-Width Networks as Linear Models . . . . . . . . . . . . . . 287
11 Representation Learning 291
11.1 Differential of the Neural Tangent Kernel . . . . . . . . . . . . . . . . . . 293
11.2 RG Flow of the dNTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
11.2.0 Forward Equation for the dNTK . . . . . . . . . . . . . . . . . . . 297
11.2.1 First Layer: Zero dNTK . . . . . . . . . . . . . . . . . . . . . . . . 299
11.2.2 Second Layer: Nonzero dNTK . . . . . . . . . . . . . . . . . . . . . 300
11.2.3 Deeper Layers: Growing dNTK . . . . . . . . . . . . . . . . . . . . 301
11.3 Effective Theory of the dNTK at Initialization . . . . . . . . . . . . . . . 310
11.3.1 Scale-Invariant Universality Class . . . . . . . . . . . . . . . . . . . 312
11.3.2 K =0 Universality Class . . . . . . . . . . . . . . . . . . . . . . . 314
11.4 Nonlinear Models and Nearly-Kernel Methods . . . . . . . . . . . . . . . . 317
11.4.1 NonlinearModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
11.4.2 Nearly-Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . 324
11.4.3 Finite-Width Networks as NonlinearModels . . . . . . . . . . . . . 330
∞ The End of Training 335
∞.1 Two More Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
∞.2 Training at FiniteWidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
∞.2.1 A Small Step Following a Giant Leap . . . . . . . . . . . . . . . . 351
∞.2.2 ManyMany Steps of Gradient Descent . . . . . . . . . . . . . . . . 358
∞.2.3 Prediction at FiniteWidth . . . . . . . . . . . . . . . . . . . . . . 373
∞.3 RG Flow of the ddNTKs: The Full Expressions . . . . . . . . . . . . . . . 384
ε Epilogue: Model Complexity from the Macroscopic Perspective 389
Published online by Cambridge University Press
viii Contents
A Information in Deep Learning 399
A.1 Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . . . . . 400
A.2 Information at Infinite Width: Criticality . . . . . . . . . . . . . . . . . . 409
A.3 Information at Finite Width: Optimal Aspect Ratio . . . . . . . . . . . . 411
B Residual Learning 425
B.1 Residual Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . 428
B.2 Residual Infinite Width: Criticality Analysis . . . . . . . . . . . . . . . . . 429
B.3 Residual Finite Width: Optimal Aspect Ratio . . . . . . . . . . . . . . . . 431
B.4 Residual Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
References 439
Index 445

Daniel A. Roberts was cofounder and CTO of Diffeo, an AI company acquired by Salesforce; a research scientist at Facebook AI Research; and a member of the School of Natural Sciences at the Institute for Advanced Study in Princeton, NJ. He was a Hertz Fellow, earning a PhD from Massachusetts Institute of Technology in theoretical physics, and was also a Marshall Scholar at Cambridge and Oxford Universities.

Sho Yaida is a research scientist at Meta AI. Prior to joining Meta AI, he obtained his PhD in physics at Stanford University and held postdoctoral positions at MIT and at Duke University. At Meta AI, he uses tools from theoretical physics to understand neural networks, the topic of this book.

Boris Hanin is an Assistant Professor at Princeton University in the Operations Research and Financial Engineering Department. Prior to joining Princeton in 2020, Boris was an Assistant Professor at Texas A&M in the Math Department and an NSF postdoc at MIT. He has taught graduate courses on the theory and practice of deep learning at both Texas A&M and Princeton.

What makes us different?

• Instant Download

• Always Competitive Pricing

• 100% Privacy

• FREE Sample Available

• 24-7 LIVE Customer Support

Reviews (1)

1 review for The Principles of Deep Learning Theory Daniel A. Roberts, ISBN-13: 978-1316519332

  1. Emma Peterson (verified owner) – December 21, 2023

    Rated 5 out of 5

    Super fast, received my eBook immediately!

Add a review Cancel reply

You must be logged in to post a review.

Shipping & Delivery

You will receive the link of your eBook 30 seconds after purchase on your email (check you email or junk mail), and you can login to your account at anytime using your username to read or download your eBook.

If you have any problem or any other questions, you can email us or try the chat widget.

Visit contact us.

Related products

-75%
Virtual Reality Designs 1st Edition Adriana Peña Pérez Negrón, ISBN-13: 978-0367894979
Compare

Virtual Reality Designs 1st Edition Adriana Peña Pérez Negrón, ISBN-13: 978-0367894979

Computing
$50.00 Original price was: $50.00.$12.34Current price is: $12.34.
Rated 4.00 out of 5
Virtual Reality Designs 1st Edition by Adriana Peña Pérez Negrón, ISBN-13: 978-0367894979 [PDF eBook eTextbook] Publisher: ‎ CRC Press; 1st
Add to wishlist
Add to cart
Quick view
-63%
Python for Everyone 2nd Edition by Cay S. Horstmann, ISBN-13: 978-1119056553
Compare

Python for Everyone 2nd Edition by Cay S. Horstmann, ISBN-13: 978-1119056553

Computing
$50.00 Original price was: $50.00.$18.70Current price is: $18.70.
Rated 4.00 out of 5
Python for Everyone 2nd Edition by Cay S. Horstmann, ISBN-13: 978-1119056553 [PDF eBook eTextbook] 752 pages Publisher: Wiley; 2 edition
Add to wishlist
Add to cart
Quick view
-71%
Software Engineering 10th GLOBAL Edition by Ian Sommerville, ISBN-13: 978-1292096131
Compare

Software Engineering 10th GLOBAL Edition by Ian Sommerville, ISBN-13: 978-1292096131

Computing
$50.00 Original price was: $50.00.$14.30Current price is: $14.30.
Rated 5.00 out of 5
Software Engineering 10th GLOBAL Edition by Ian Sommerville, ISBN-13: 978-1292096131 [PDF eBook eTextbook] Publisher: ‎ PEARSON; 10th edition (August 20,
Add to wishlist
Add to cart
Quick view
-73%
The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory – PDF
Compare

The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory – PDF

Computing
$50.00 Original price was: $50.00.$13.35Current price is: $13.35.
Rated 4.00 out of 5
The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory, ISBN-13: 978-1118825099  [PDF eBook eTextbook]
Add to wishlist
Add to cart
Quick view
-50%
Starting Out with Python 4th GLOBAL Edition, ISBN-13: 978-1292225753
Compare

Starting Out with Python 4th GLOBAL Edition, ISBN-13: 978-1292225753

Computing
$50.00 Original price was: $50.00.$24.99Current price is: $24.99.
Rated 5.00 out of 5
Starting Out with Python 4th GLOBAL Edition by Tony Gaddis, ISBN-13: 978-1292225753 [PDF eBook eTextbook] MyLab (TM) Programming not included.
Add to wishlist
Add to cart
Quick view
-74%
Structure and Interpretation of Computer Programs 2nd Edition Harold Abelson, ISBN-13: 978-0262510875
Compare

Structure and Interpretation of Computer Programs 2nd Edition Harold Abelson, ISBN-13: 978-0262510875

Computing
$50.00 Original price was: $50.00.$12.84Current price is: $12.84.
Rated 5.00 out of 5
Structure and Interpretation of Computer Programs 2nd Edition by Harold Abelson, ISBN-13: 978-0262510875 [PDF eBook eTextbook] Publisher: ‎ The MIT
Add to wishlist
Add to cart
Quick view
-80%
Python 3 for Machine Learning by Oswald Campesato, ISBN-13: 978-1683924951
Compare

Python 3 for Machine Learning by Oswald Campesato, ISBN-13: 978-1683924951

Computing
$50.00 Original price was: $50.00.$9.99Current price is: $9.99.
Rated 4.00 out of 5
Python 3 for Machine Learning by Oswald Campesato, ISBN-13: 978-1683924951  [PDF eBook eTextbook] Publisher: ‎ Mercury Learning and Information (March
Add to wishlist
Add to cart
Quick view
-64%
Python Crash Course 2nd Edition by Eric Matthes, ISBN-13: 978-1593279288
Compare

Python Crash Course 2nd Edition by Eric Matthes, ISBN-13: 978-1593279288

Computing
$50.00 Original price was: $50.00.$17.99Current price is: $17.99.
Rated 4.00 out of 5
Python Crash Course: A Hands-On, Project-Based Introduction to Programming by Eric Matthes, ISBN-13: 978-1593279288 [PDF eBook eTextbook] Publisher: ‎ NO
Add to wishlist
Add to cart
Quick view

Free Shipping.

Via Email.

24/7 Support.

Contact Or Chat With Us.

Online Payment.

One Time Payement.

Fast Delivery.

30 Seconds After Purchase.

  • OUR COMPANY
    • EducationaleBook LLC
    • Email: [email protected]
    • Website: edebook.com
  • USEFUL LINKS
    • Home
    • Shop
    • Wishlist
    • Blog
  • OUR POLICY
    • Privacy Policy
    • Refund Policy
    • Terms & Conditions
    • DMCA
  • INFORMATIONS
    • About Us
    • FAQ
    • Contact Us
    • Request an eBook

Payment System:

EDEBOOK 2024 CREATED BY EDucationaleBook LLC. PREMIUM E-COMMERCE SOLUTIONS.
  • Home
  • Shop
  • Blog
  • About us
  • Contact us
  • Request an eBook
  • Wishlist
  • Compare
  • Login / Register
Shopping cart
Close
Sign in
Close

Lost your password?

No account yet?

Create an Account
Shop
Wishlist
14 items Cart
My account