-
Tsinghua University
- https://scholar.google.com/citations?hl=zh-CN&user=kMui170AAAAJ
Stars
EgoVerse: Egocentric Data for Robot Learning from Around the World
[Awesome] 🔥🔥🔥 Latest Papers, Codes and Datasets on Streaming / Online Video Understanding
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
Streaming Thinking for VideoLLM Streaming Video Understanding
[CVPR 2026] PersonaVLM: Long-Term Personalized Multimodal LLMs
🦞 Just talk to your agent — it learns and EVOLVES 🧬.
Official Implementation of Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
OpenClaw-RL: Train any agent simply by talking
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]
HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing
Scalable data generation for video reasoning models.
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetil…
Agentic LaTeX Writer - Local-first editor for AI-assisted academic writing
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
构建受监督的、自我进化的 Agent 组织的基础设施 | Infrastructure for supervised, self-improving agent organization. 从飞书/Telegram 运行 Claude Code,共享记忆、Agent 工厂、定时任务、通信总线。
[ICLR 2026]🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement l…
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.