노란토끼의 기술 블로그

GPU Systems & Deep Learning Hardware

JetPack 6.2.2 Flash Troubleshooting: AMD USB Incompatibility and the chroot Solution

Documenting the root cause of tegrarcm_v2 USB write timeout on AMD hosts when flashing Jetson AGX Orin, and the chroot-based workaround using an Intel laptop with limited RAM.

6.5940 L01 — Introduction and Overview

MIT 6.5940 (Song Han) Lecture 1 정리. DNN 효율화의 필요성부터 Model Compression, Quantization(AWQ/SmoothQuant/RTN), Sparsity, Edge AI, VLM/VILA, Hardware Trend까지.

6.5930 L01 - Introduction and Applications

Ilya Sutskever의 한 마디 (L01-3) 시작 전 이 강의를 왜 들어야 하는지 Ilya Sutskever(OAI 공동창업자)의 말을 인용하겠다. 2017년 ACM 튜링상 50주년 행사에서 한 말로, “Compute has been the oxygen of deep learning” 이다. 알고리즘이 아무리 좋아도 Compute 없이는 돌릴 수가 없다는 뜻이다. 역으로 읽으면 Compute를 더 효율적으로 만드는 사람이 딥러닝 발전의 병목을 해결한다는 말과도 일맥상통한다. 이는 곧 MLSys가 AI-resistant한 커리어인 이유이기도 하다. AI 모델이 아무리 바뀌어도 (CNN → Transformer → MoE → Mamba…) 결국 Compute는 필연적인 존재다. 모델을 만드는 사람은 새 아키텍처가 나올 때마다 리셋되지만, 그 모델을 효율적으로 돌리는 하드웨어/시스템 인프라를 다루는 사람은 계속 수요가 있다. ...

MIT 6.5930: Hardware Architecture for Deep Learning

MIT EECS 6.5930 by Prof. Vivienne Sze — Hardware Architecture for Deep Learning

Opening Note

GPU Systems & Deep Learning Hardware