「WSL2でunslothのGPROトレーニングを試してみる｜noguchi-shoji」

「DeepSeek-R1 の推論を自分のローカルデバイスで再現できるように」「わずか7GBのVRAMでアハ体験を」とのことなので、UnslothのGRPO（Group Relative Policy Optimizatin）トレーニングを試してみます。今回は Phi-4 (14B)で試してみます。 You can now reproduce DeepSeek-R1's reasoning on your own local device! Experience the "Aha" moment with just 7GB VRAM. Unsloth reduces GRP

「DeepSeek-R1 の推論を自分のローカルデバイスで再現できるように」「わずか7GBのVRAMでアハ体験を」とのことなので、UnslothのGRPO（Group Relative Policy Optimizatin）トレーニングを試してみます。今回は Phi-4 (14B)で試してみます。 You can now reproduce DeepSeek-R1's reasoning on your own local device! Experience the "Aha" moment with just 7GB VRAM. Unsloth reduces GRP

Webページ

コンテンツ文字数：0 文字

見出し数（H2/H3タグ）：0 個

閲覧数：11 件

2025-02-09 14:04:08

オリジナルページを開く