DEV Community

soy profile picture

soy

Patent lawyer turned AI engineer. Processed 4M patents with local LLM on RTX 5090. Building PatentLLM — AI-powered patent search. Also ranked #1 on Floodgate (shogi AI). Writing about local LLM etc.

Local LLM Revolution: Speed, Security, and Million-Token Contexts

Local LLM Revolution: Speed, Security, and Million-Token Contexts

Comments
3 min read

Want to connect with soy?

Create an account to connect with soy. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
AI's Infrastructure & Agents: From Chips to Code Automation

AI's Infrastructure & Agents: From Chips to Code Automation

Comments
4 min read
I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 380 for the Browser

I Built a SQLite Editor in 180 Lines, Then Rebuilt It in 380 for the Browser

Comments
3 min read
Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

Comments
3 min read
The Future of Open Source and Security: From Geopolitics to Threats in the Development Field

The Future of Open Source and Security: From Geopolitics to Threats in the Development Field

Comments
4 min read
The Forefront of Development Efficiency with AI Agents: From OSS to Code Review

The Forefront of Development Efficiency with AI Agents: From OSS to Code Review

Comments
3 min read
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Comments
3 min read
Today's LLM Frontier: From the Breakthrough of Kimi K2.5 to GPT-5.4/Gemini Flash-Lite

Today's LLM Frontier: From the Breakthrough of Kimi K2.5 to GPT-5.4/Gemini Flash-Lite

Comments
3 min read
Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Canonical Eyes IPO — Ubuntu Proves the Revival of Linux and OSS

Comments
3 min read
Developer Security and AI Industry Trends: Langflow Vulnerability, Cargo Advisory, and the State of AI at GDC

Developer Security and AI Industry Trends: Langflow Vulnerability, Cargo Advisory, and the State of AI at GDC

Comments
3 min read
AI and Cloud Infrastructure Convergence: Innovations in Cloudflare Workers AI, Project Nomad, and Trainium

AI and Cloud Infrastructure Convergence: Innovations in Cloudflare Workers AI, Project Nomad, and Trainium

Comments
3 min read
Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

Comments
3 min read
The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

Comments
3 min read
Current Frontline in AI Agent Development: Robust Agent Design and Security Measures

Current Frontline in AI Agent Development: Robust Agent Design and Security Measures

Comments
3 min read
Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Running Karpathy's autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Comments
3 min read
Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

Comments
2 min read
AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

AI Era Security and OSS: Trivy Compromise, Google and Cloudflare's Countermeasures

Comments
3 min read
Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Frontiers of AI Agent Development: Claude HUD, OpenAI Monitoring, and the Impact of Astral Acquisition

Comments
3 min read
Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Today's Local LLM Acceleration Techniques: ik_llama.cpp Speedup, Tinybox, and NVIDIA GTC Latest Trends

Comments
3 min read
AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

AI Questions the Future of the Internet: Freedom, Quality Decline, and Content Reliability

Comments
7 min read
Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Data Preparation and Security to Accelerate AI Development: Leveraging Open Source Tools

Comments
6 min read
Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Next-Gen LLMs: Deep Dive into Compact, High-Speed Models and Temporal Reasoning – Gemini 3.1 Flash-Lite, GPT-5.4 mini/nano

Comments
7 min read
AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

AI Agent Safety and Operations: Frontline Measures Against Prompt Injection and Monitoring

Comments
6 min read
2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

2026: Local AI Evolves! From Offline Devices to Large-Scale Inference on RTX

Comments
5 min read
New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

New Synergy of Cloud and AI: Cloudflare Workers AI, Google's OSS Security, and AI Integration in WordPress

Comments
5 min read
RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

RTX 40 Series Makes LLM Blazing Fast! The Complete Guide to Inference Optimization for Individual Developers [2026 Latest Edi...

Comments
7 min read
The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

The Real Inflection Point GTC 2026 Quietly Announced — Why NVIDIA Bet on "Open"

1
Comments
9 min read
Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Training a Shogi Engine: ONNX Conversion, TensorRT, and Getting Crushed by Ryfamate

Comments
4 min read
Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

Lennart Poettering and the systemd Wars: The Most Controversial Software in Linux History

Comments
6 min read
The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

The README Trap: Why AI Coding Assistants Skip Your Docs (and 3 Fixes)

1
Comments
4 min read
From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

From 30 Seconds to 3 Milliseconds: Replacing LIKE with FTS5 on 1.7M Patent Records

1
Comments
4 min read
SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

SoyLM: Building a Zero-Dependency Local RAG Tool in a Single Python File

Comments
5 min read
Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

Adding Stripe Checkout to a Solo SaaS: Lessons from PatentLLM's $1K/mo Plan

2
Comments
4 min read
When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

Comments
3 min read
Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

Flutter Web + PWA: Why Add to Home Screen Gives You a Full-Screen App

Comments
4 min read
Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

Tailscale Deep Dive: Why Developers Are Ditching Traditional VPNs

3
Comments
5 min read
Building a 5-in-1 Local LLM App with Flutter Web and Flask

Building a 5-in-1 Local LLM App with Flutter Web and Flask

Comments
4 min read
Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

Claude Code + MCP SQLite Server: Query Your Database Without Leaving the Conversation

Comments
4 min read
How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

How Google Finds Every Restaurant in Japan — And Why Your Full-Text Search Can't

2
Comments
5 min read
OpenAI Acquires Astral (uv / Ruff) — What It Really Means

OpenAI Acquires Astral (uv / Ruff) — What It Really Means

Comments
5 min read
The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

The Technical Debt Local AI Must Fix Before It's Too Late — What NemoClaw Says About NVIDIA's Philosophy

Comments
17 min read
Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

Punching Through NVIDIA NemoClaw's Sandbox to Hit Local vLLM on RTX 5090

2
Comments
4 min read
Why Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap

Why Google Wasn't Indexing My FastAPI Site — The HEAD Request Trap

1
Comments
2 min read
vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

vLLM vs TensorRT-LLM vs Ollama vs llama.cpp — Choosing the Right Inference Engine on RTX 5090

1
Comments
7 min read
Using Python to Load Google Docs into AI — Drive API Minimal Permission Setup

Using Python to Load Google Docs into AI — Drive API Minimal Permission Setup

Comments
5 min read
Hardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations

Hardware Selection for Local LLMs: Overcoming the VRAM Wall with Practical GPU, CPU, and Memory Configurations

1
Comments
6 min read
What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

What I Gained from Interacting with Shogi AI: The Path to 1st Place in Floodgate and My Approach to Distilled Models

1
Comments
3 min read
Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Turn Conversation Data into Assets with Gemini API: History Export, RAG, and Streamlit

Comments
8 min read
Automating Video Generation with Remotion and VOICEVOX: From Environment Setup to Performance Optimization

Automating Video Generation with Remotion and VOICEVOX: From Environment Setup to Performance Optimization

1
Comments
8 min read
Cloudflare Tunnel Practical Guide: Securely Exposing a Home AI Server Without Port Forwarding

Cloudflare Tunnel Practical Guide: Securely Exposing a Home AI Server Without Port Forwarding

1
Comments
6 min read
Automated Google Drive Backup with Rclone: Headless OAuth Authentication and systemd Configuration

Automated Google Drive Backup with Rclone: Headless OAuth Authentication and systemd Configuration

1
Comments
7 min read
Claude Code Practical Guide: Debugging, Test Automation, and CUDA Environment Setup with Opus 4.6

Claude Code Practical Guide: Debugging, Test Automation, and CUDA Environment Setup with Opus 4.6

Comments
4 min read
I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

I Posted My Patent Search AI to Reddit r/LocalLLaMA and Got 65 Upvotes and Over 20 Questions

1
Comments
5 min read
Coders at Work — Index of All 15 Programmer Interviews

Coders at Work — Index of All 15 Programmer Interviews

Comments
7 min read
RTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison

RTX 5090 + Nemotron 9B on vLLM — Benchmarks & TRT-LLM Comparison

1
Comments
2 min read
Talent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka

Talent Blooms When You Stop Relying on "Motivation": 7 Insights on the "Spring Mind" Left by Genius Mathematician Kiyoshi Oka

Comments
6 min read
Three Months of Code: What a Patent Lawyer Built from Zero

Three Months of Code: What a Patent Lawyer Built from Zero

Comments 1
5 min read
I Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5

I Built a Free Patent Search Engine with 3.5M US Patents — No Login, Powered by SQLite FTS5

Comments 1
3 min read
Operational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2

Operational Techniques for Automatically Starting vLLM, Flask, and cron with systemd Services in WSL2

Comments
3 min read
Achieving Bidirectional Integration of Streamlit Backend Flutter Frontend in a WSL2 Environment

Achieving Bidirectional Integration of Streamlit Backend Flutter Frontend in a WSL2 Environment

Comments
2 min read
loading...