Rendered at 12:26:50 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
hodgehog11 8 hours ago [-]
Obviously it's great that those who are only aware of JEPA should be educated about CCA. If you don't know CCA, you should not be working in unsupervised learning.
However, it's pretty obvious that they are related since CCA is (or should be) well-known to be among the original unsupervised learning algorithms. It's the progenitor of the field. It works, it always did. Just like logistic regression for classification. Deep learning is about putting in huge computational effort for the extra few percent.
This is like saying that Gauss deserves the credit for LLMs because he came up with least-squares regression, which was the progenitor of supervised learning. Yes, there is a chain of discoveries leading back, but when you give the credit that far back, it's just insulting to the hard work that came inbetween.
Gauss and Hotelling are famous enough as it is.
(Before anyone asks, I'm not shilling for JEPA, I just think this argument is reductive for all of unsupervised and semi-supervised learning.)
jdw64 8 hours ago [-]
I want to make something in this area(LLM). Can you recommend any books?
hodgehog11 8 hours ago [-]
Books? No, not really. Maybe others will have better suggestions for newcomers, sorry. Are you talking research novelty or just applying current methods to a given task?
The latter is covered well by Andrej Karpathy's videos and by just playing around with current models and other tutorials in a small test environment. You don't need to know very much, there's a lot of low-hanging fruit.
For the former, the field is moving rapidly and most of the innovations are coming from papers. Any book that claims to cover deep learning is almost inevitably outdated. Find a university or institution near you and see if they have an undergraduate reading group on deep learning that is open to the public to attend. Mine does, and it's really helpful for staying up to date with the latest ideas. "Probabilistic Machine Learning" by Murphy contains the material that I would consider prerequisite if you want to understand the ideas which underpin modern deep learning (even if it contains virtually no deep learning in it), and I would hope that any student or colleague of mine would be familiar with most of it. But I'm not sure it's good to learn from, and picking all that up takes a while to be honest.
nextos 7 hours ago [-]
> "Probabilistic Machine Learning" by Murphy [...] even if it contains virtually no deep learning in it
This is confusing. Are you referring to the old 2012 version?
Volumes 1 & 2 (2022-3) contain a substantial amount of deep learning [1], including relatively recent developments.
There's also a new RL volume getting written, with some drafts deposited in arXiv [2].
I was mostly referring to Volume 1 (not advanced topics). You have a point that Volume 2 definitely contains more. To be honest, I was mostly covering myself from a "that's not real deep learning" criticism; "relatively recent developments" is pretty generous if you're active in the field. Given its rapidity, anything over a few years old is essentially considered classical. It's almost impossible to have a book that is up-to-date with the state of the art here.
These are very nice volumes though (RL one is good too), and Murphy should be commended for the amount of work in here. It's probably as good a compendium as one can expect.
jdw64 8 hours ago [-]
I've read the books you mentioned(Probabilistic Machine Learning). I guess there's nothing left but papers, right? Thanks for the advice.
HarHarVeryFunny 10 minutes ago [-]
As the saying goes: correlation is not causation
OTOH prediction doesn't necessarily reflect causation either, but prediction is what JEPA is about, how our brain/intelligence works, and one of the great confirmations of LLMs is how powerful prediction errors are as a learning signal.
JEPA appears a step in the right direction of trying to build a brain rather than a language model - to use prediction the way the brain uses it to predict the future (not an historical frozen training set), and learn a real world model of how the world behaves. Any JEPA implementations I've read about use a Transformer as their predictive component since even prediction (and certainly not correlation) is not where JEPA is innovating - it is more about applying prediction to the right problem (assuming the goal is to implement animal/human intelligence) of predicting sensory inputs at the right level of representation.
A recent JEPA variant, Causal-JEPA, moves beyond just infilling to predict object state from object interactions (i.e. to learn causal predictive relationships).
Interesting. So even more of the means to create this wave of AI existed sooner than we knew, at least in theory. Fun to think of a version of events where these models came up alongside GPUs; as if real-time graphics wasn't demanding enough on the supply-chain, hah.
However, it's pretty obvious that they are related since CCA is (or should be) well-known to be among the original unsupervised learning algorithms. It's the progenitor of the field. It works, it always did. Just like logistic regression for classification. Deep learning is about putting in huge computational effort for the extra few percent.
This is like saying that Gauss deserves the credit for LLMs because he came up with least-squares regression, which was the progenitor of supervised learning. Yes, there is a chain of discoveries leading back, but when you give the credit that far back, it's just insulting to the hard work that came inbetween.
Gauss and Hotelling are famous enough as it is.
(Before anyone asks, I'm not shilling for JEPA, I just think this argument is reductive for all of unsupervised and semi-supervised learning.)
The latter is covered well by Andrej Karpathy's videos and by just playing around with current models and other tutorials in a small test environment. You don't need to know very much, there's a lot of low-hanging fruit.
For the former, the field is moving rapidly and most of the innovations are coming from papers. Any book that claims to cover deep learning is almost inevitably outdated. Find a university or institution near you and see if they have an undergraduate reading group on deep learning that is open to the public to attend. Mine does, and it's really helpful for staying up to date with the latest ideas. "Probabilistic Machine Learning" by Murphy contains the material that I would consider prerequisite if you want to understand the ideas which underpin modern deep learning (even if it contains virtually no deep learning in it), and I would hope that any student or colleague of mine would be familiar with most of it. But I'm not sure it's good to learn from, and picking all that up takes a while to be honest.
This is confusing. Are you referring to the old 2012 version?
Volumes 1 & 2 (2022-3) contain a substantial amount of deep learning [1], including relatively recent developments.
There's also a new RL volume getting written, with some drafts deposited in arXiv [2].
[1] https://probml.github.io/pml-book
[2] https://arxiv.org/pdf/2412.05265
These are very nice volumes though (RL one is good too), and Murphy should be commended for the amount of work in here. It's probably as good a compendium as one can expect.
OTOH prediction doesn't necessarily reflect causation either, but prediction is what JEPA is about, how our brain/intelligence works, and one of the great confirmations of LLMs is how powerful prediction errors are as a learning signal.
JEPA appears a step in the right direction of trying to build a brain rather than a language model - to use prediction the way the brain uses it to predict the future (not an historical frozen training set), and learn a real world model of how the world behaves. Any JEPA implementations I've read about use a Transformer as their predictive component since even prediction (and certainly not correlation) is not where JEPA is innovating - it is more about applying prediction to the right problem (assuming the goal is to implement animal/human intelligence) of predicting sensory inputs at the right level of representation.
A recent JEPA variant, Causal-JEPA, moves beyond just infilling to predict object state from object interactions (i.e. to learn causal predictive relationships).
https://arxiv.org/pdf/2602.11389