What model to grade practice test?

HumanPerson@sh.itjust.works · 2 days ago

What model to grade practice test?

HumanPerson@sh.itjust.works · 1 day ago

B580+a750. They do work together.

brucethemoose@lemmy.world · edit-2 1 day ago

Oh yeah, presumably through SYCL or Vulcan splitting.

Id try Qwen3 30B, maybe a custom quantization if it doesn’t quite fit in your vram pool (as it should be very close). It should be very fast and quite smart.

Qwen3 32B would fit too (a fully dense model), but you would definitely need to tweak the settings without it being really slow.

HumanPerson@sh.itjust.works · 1 day ago

Qwen3 also doesn’t work because I’m using the ipex llm docker container which has ollama 5.8 or something. It doesn’t matter now because I have taken the test I was practicing for since posting this. Playing with qwen3 on CPU, it seems good but the reasoning feels like most open reasoning models where it gets the right answer then goes “wait that’s not right…”

brucethemoose@lemmy.world · edit-2 1 day ago

Yeah it does that, heh.

The Qwen team recommend a fairly high temperature, but I find it’s better with modified sampling (lower temperature, 0.1 MinP, a bit of rep penalty or DRY). Then it tends to not “second guess” itself and take the lower probability choice of continuing to reason.

If you’re looking for alternatives, Koboldcpp does support Vulkan. It may not be as fast as the (SYCL?) docker container, but supports new models and more features. It’s also precompiled as a one click exe: https://github.com/LostRuins/koboldcpp