I took a practice test (math) and would like to have it be graded by a LLM since I can’t find the key online. I have 20GB VRAM, but I’m on intel Arc so I can’t do gemma3. I would prefer models from ollama.com 'cause I’m not deep enough down the rabbit hole to try huggingface stuff yet and don’t have time to right now.

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 day ago

    Models running on gguf should all work with your gpu assuming its set up correctly and properly loaded into the vram. It shouldnt matter if its qwen or mistral or gemma or llama or llava or stable diffusion. Maybe the engine you are using isnt properly configured to use your arc card so its all just running on your regular ram which limits things? Idk.

    Intel arc gpu might work with kobold and vulcan without any extra technical setup. Its not as deep in the rabbit hole as you may think, a lot of work was put in to making one click executables with nice guis that the average person can work with…

    Models

    Find a bartowlski made quantized gguf of the model you want to use. Q4_km is recommended average quant to try first. Try to make sure it all can fit within your card size wise for speed. Shouldnt be a big problem for you with 20gb vram to play with. Hugging face gives the size in gb next to each quant.

    Start small with like high quant of qwen 3 8b. Then a gemma 12b, then work your way up to a medium quant of deephermes 24b.

    Thinking models are better at math and logical problem solving. But you need to know how to communicate and work with llms to get good results no matter what. Ask it to break down a problem you already solved and test it for comprehension.

    kobold engine

    Download kobold.cpp, execute it like a regular program and adjust settings in graphical interface that pops up. Or make a startup script with flags.

    For input processing library, see if Vulcan processing works with Intel arc. Make sure flash attention is enabled too. Offload all layers of the model I make note of exactly how many layers each model has during startup and specify it but it should figure it out smartly even if not.