|
| 1 | +--- |
| 2 | +title: 'nvidia gtc day 1' |
| 3 | +tags: 'journal' |
| 4 | +date: 'Mar 18, 2025' |
| 5 | +--- |
| 6 | + |
| 7 | +a few notes from the keynote at SAP center, before i got kicked out 20 minutes in for sitting at the stairs, and jensen starting talking about GPUs and hardware. |
| 8 | + |
| 9 | +the AI wave started with (2012 Alexnet) |
| 10 | + |
| 11 | +perception AI -> generative AI -> agentic AI -> physical AI |
| 12 | + |
| 13 | +three problems |
| 14 | + |
| 15 | +- how to solve the data problem? |
| 16 | +- how to solve the training problem? |
| 17 | +- how do you scale? |
| 18 | + |
| 19 | +three scaling laws |
| 20 | + |
| 21 | +pre-training scaling -> post-training scaling -> test-time scaling "long thinking" |
| 22 | + |
| 23 | +reasoning ai inference compute > 100x one shot more tokens |
| 24 | + |
| 25 | +how to solve data problem? |
| 26 | + |
| 27 | +problem prompts -> model -> answer -> verifier -> back to model |
| 28 | + |
| 29 | +post training with RLVR > 100T tokens |
| 30 | + |
| 31 | +top 4 US CSPs |
| 32 | + |
| 33 | +- 2024: 1.3M hopper GPUs |
| 34 | +- 2025: 3.6 Blackwell GPUs (2-gpu per Chip) |
| 35 | + |
| 36 | +computing at inflection point |
| 37 | + |
| 38 | +2028 prediction $1T+ data center Capex |
| 39 | + |
| 40 | +- a new computing approach |
| 41 | +- increase in recognition that the future of software requires capital investment |
| 42 | + |
| 43 | +computer is generating tokens for software, not just retrieval of files |
| 44 | + |
| 45 | +computers are now AI factories, everyone will have two factories, one for the product, another for the AI |
| 46 | + |
| 47 | +CUDA-X for every industry |
| 48 | + |
| 49 | +- cuPYNUMERIC - numpy on GPUs |
| 50 | +- CuLITHO - computational lithography |
| 51 | +- ARIAL - 5G radio networks with AI |
| 52 | +- cuOPT - mathematical optimization (plan seats, inventory, plants, driver and riders) |
| 53 | +- MONAI - medical |
| 54 | +- Earth 2 - radiology imaging |
| 55 | +- cuQuantum - quantum computing |
| 56 | +- ... |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +and some notes for Yann LeCun's talk |
| 61 | + |
| 62 | +4 things he's excited about |
| 63 | + |
| 64 | +- understand physical world |
| 65 | +- consistent memory |
| 66 | +- reason |
| 67 | +- plan |
| 68 | + |
| 69 | +world models: models of the physical world |
| 70 | + |
| 71 | +- we all have it to allow us to manipulate thoughts and predict what happens |
| 72 | +- architecture: different from language architectures |
| 73 | +- tokens are discrete, probability distribution, 100k numbers, we know how to do this |
| 74 | +- we don't know how to do this with video, we have failed to predict next pixel, it spends all its resources to come up with detail that is not possible to predict |
| 75 | +- what works better: learn representation of image/video/natural signal and make prediction in that space |
| 76 | + - require techniques to prevent collapse where prediction is constant and the input is useless |
| 77 | +- AMI: advanced machine intelligence |
| 78 | + - systems that learn abstract mental models of world and reasons and plans (3-5 years) |
| 79 | + - scaling them up to human-level ai |
| 80 | +- reasoning with tokens is not the right way |
| 81 | + - JEPA models |
| 82 | + |
| 83 | +it takes human 400,000 years to read all text in the world, only 4 years by vision analog computing |
| 84 | + |
| 85 | +a joint embedding predictive architecture (V-JEPA) |
| 86 | + |
| 87 | +- sliding window of 16 frames, predict the next few frames, measure prediction error |
| 88 | + |
| 89 | +- baby humans at 9 months can understand gravity |
| 90 | + |
| 91 | +resnet: allows NNs to backprop all the way with many layers |
| 92 | + |
| 93 | +GPT replaced BERT style |
| 94 | + |
| 95 | +no need to mask data for training |
| 96 | + |
| 97 | +open source distributed training is the future |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +ate an entire branzino for dinner with HP with red wine at Rollati Ristorante. best dinner i've had the entire year. |
0 commit comments