Purple trees

SmokeyDope@lemmy.world · edit-2 20 hours ago

You are correct in your understanding. However the last part of your comment needs a big asterisk. Its important to consider quantization.

The full f16 deepseek r1 gguf from unsloth requires 1.34tb of ram. Good luck getting the ram sticks and channels for that.

The q4_km mid range quant is 404gb which would theoretically fit inside 512gb of ram with leftover room for context.

512gb of ram is still a lot, theoretical you could run a lower quant of r1 with 256gb of ram. Not super desirable but totally doable.

SmokeyDope@lemmy.world · edit-2 2 days ago

I have been using deephermes daily. I think CoT reasoning is so awesome and such a game changer! It really helps the model give better answers especially for hard logical problems. But I don’t want it all the time especially on an already slow model. Being able to turn it on and off wirhout switching models is awesome. Mistral 24b deephermes is relatively uncensored, powerful and not painfully slow on my hardware. a high quant of llama 3.1 8b deephermes is able to fit entirely on my 8gb vram.

SmokeyDope@lemmy.world · 3 days ago

Very interesting stuff! Thanks for sharing.

SmokeyDope@lemmy.world · edit-2 3 days ago

What is it? Oh I see the sticker now :-) yes quite the beastly graphics card so much vram!

SmokeyDope@lemmy.world · edit-2 3 days ago

Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.

Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.

LLM wise Your phone can run 1-4b models, Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.

Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.

Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.

SmokeyDope@lemmy.world · edit-2 4 days ago

Thank you! I like to spread the word about things I feel passionate about. Theres so much crap that promises to improve your life and only a few good things that actually do. Dry herb vapes rocked my world and its my privilege to potentially be the internet comment ear worm that eventually convinces some to try the journey to see if it changes their world too.

Unfortunately just a lot of close minded individuals who are happy with what they got going on and dont understand the point or had a bad experience 10 years ago or just confuse it with cartridge vaping. Some people dont like the look of something so they refuse to ever try it. Just because it didn’t pass their vibe check. Its fustrating but thats people and stubborn tradition for you. I believe that if its meant for you, then eventually it will find a way into your life when you need it.

SmokeyDope@lemmy.world · edit-2 4 days ago

Been treating my insomnia with the good stuff pretty much daily over a decade. Dry herb microdosing and getting down the timing of the high cycle.is key to maintaining tolerance in the long term.

Let’s touch on the later point quick first. Everyones different but for me being high is something like vape -> get high -> crashout(sleepy) -> (caffeine or more vaping). If you can establish a time based sleep schedule to align your circadian rythm while timing the crashout at the same general time youre golden.

Now for tolerance. Im going to be blunt with you, most cannabis smokers/vapers are doing it… Unscientifically. They’ve never heard of a dry herb vape or if they did its a decade old dinosaur like the fucking pax, never considered microdosing, never wanted to understand what the bare minimum and healthiest vaping methods is. If your burning your bud your doing more harm to your lungs than good, and your wasting your herb big time. Sorry, thats how it is.

The journey these questions lead me on changed my life and truly turned the herb into dosable medicine. You want to stop building tolerance? You need to work your way down to 0.05g-0.10g dry herb hits green. Its effectively the smallest unit of bud for an appreciable hit. You can microdose all day and never build appreciable T. it will basically fully reset tommorow or the day after.

And guess what? All the black and brown leftover ABV is still good, decarbed and chock full of CBC/CBN with a little THC. Save it up and process into nighttime oil sleeping pills

SmokeyDope@lemmy.world · edit-2 5 days ago

Which ones are not actively spending an amount of money that scales directly with the number of users?

Most of these companies offer direct web/api access to their own cloud supercomputer datacenter, and All cloud services have some scaling with operation cost. The more users connect and use computer, the better hardware, processing power, and data connection needed to process all the users. Probably the smaller fine tuners like deephermes that just take a pre-cooked bigger model and sell the cloud access at a profit with minimal operating cost do best with the scaling. They are also way way cheaper than big model access cost probably for similar reasons.

OpenAI, meta, and google are very expensive compared to competition and probably operate at a loss. Its important to note that immediate profit is only one factor. Many big well financed companies will happily eat the L on operating cost and electrical usage as long as they feel they can solidify their presence in the market in the coming decades. Control, (social) power, lasting influence, data collection. These are some of the other valuable currencies corporations and governments recognize that they will exchange monetary currency for.

but its treated as the equivalent of electricity and its not

I assume you mean in a tech progression kind of way. A better comparison is that its being treated closer to the invention of transistors and computers. Before we could only do information processing with the cold hard certainty of logical bit calculations. We got by quite a while just cooking fancy logical programs to process inputs and outputs. Data communication, vector graphics and digital audio, cryptography, the internet, just about everything today is thanks to the humble transistor and logical gate, and the clever brains that assemble them into functioning tools.

Machine learning models are based on neuron brain structures and biological activation trigger pattern encoding layers. We have found both a way to train trillions of transtistors simulate the basic information pattern organizing systems living beings use, and a point in time which its technialy possible to have the compute available needed to do so. The perception was discovered in the 1950s. It took almost a century for computers and ML to catch up to the point of putting theory to practice. We couldn’t create artificial computer brain structures and integrate them into consumer hardware 10 years ago, the only player then was google with their billion dollar datacenter and alphago/deepmind.

SmokeyDope@lemmy.world · 5 days ago

Theres more than just chatgpt and American data center/llm companies. Theres openAI, google and meta (american), mistral (French), alibaba and deepseek (china). Many more smaller companies that either make their own models or further finetune specialized models from the big ones. Its global competition, all of them occasionally releasing open weights models of different sizes for you to run your own on home consumer computer hardware. Dont like big models that were trained on stolen copyright infringed information? Use ones trained completely on open public domain information.

Your phone can run a 1-4b model, your laptop 4-8b, your desktop with a GPU 12-32b. No data is sent to servers when you self-host. This is also relevant for companies that data kept in house.

Like it or not machine learning models are here to stay. You can self host open weights models trained on completely public domain knowledge already. It actually does provide useful functions to home users beyond being a chatbot. People have used llms to make music, generate images/video, see images for details including document scanning, boilerplate basic code logic, check for semantic mistakes that regular spell check wont pick up on.

Models around 24-32b range in high quant are reasonably capable of basic information processing task and generally accurate domain knowledge. You can’t treat it like a fact source because theres always a small statistical chance of it being wrong but its OK starting point for researching like Wikipedia.

My local colleges are researching multimodal llms recognizing the subtle patterns in billions of cancer cell photos to possibly help doctors better screen patients.

The problem is that theres too much energy being spent training them. It takes a lot of energy in compute power to cool a model and refine it. Its important for researchers to find more efficent ways to make them, Deepseek did this, they found a way to cook their models with way less energy and compute which is part of why that was exciting. Hopefully this energy can also come more from renewable instead of burning fuel.

SmokeyDope@lemmy.world · 6 days ago

Any tips on sexing cannabis early?

SmokeyDope@lemmy.world · 8 days ago

And also a tad bit of folly from making said creation a mr-potato-head ass motherfucker stitched together from corpse parts and a half rotten brain. Professionals have standards, could have sourced some fresher parts for his whack ass meat baby.

SmokeyDope@lemmy.world · edit-2 9 days ago

If you are asking questions try out deephermes finetune of llama 3.1 8b and turn on CoT reasoning with the special system prompt.

Tap for spoiler

You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.

It really helps the smaller models come up with nicer answers but takes them a little more time to bake an answer with the thinking part. Its unreal how good models have come in a year thanks to leveraging reasoning in context space.

SmokeyDope@lemmy.world · 10 days ago

Welcome! Thats really cool to hear that the post inspired you to get an LLM going on the laptop. What size Gemma 3 are you able to run like a 8b?

SmokeyDope@lemmy.world · 10 days ago

Very cool photos you been showing this week thanks for sharing zaraki

SmokeyDope@lemmy.world · edit-2 10 days ago

deleted by creator

SmokeyDope@lemmy.world · 10 days ago

I don’t know how old the last version of kobold you used was or what it looked like. The newest verion has a couple different web UI themes to pick from in settings, the basic one has more buttons for easier editing, the corpo one is pretty sleek for phones and tablets. They finally have a terminal mode if you are running on headless servers.

SmokeyDope@lemmy.world · edit-2 11 days ago

Purple trees

SmokeyDope@lemmy.world · 11 days ago

I love this answer! What a wonderful context for a generative model to bring people together and add positivity to daily life. Thank you very much for sharing.

SmokeyDope@lemmy.world · edit-2 11 days ago

Not the person you asked but I used kobold.cpp to generate images with SD models. it works as a okay introduction to image gen. Their wikihas everything you need to get it working

SmokeyDope@lemmy.world · edit-2 11 days ago

The most useful thing my LLM has done is help me with hobbyist computer coding projects and to ask advanced stem questions. I try to use my llm to parse code that im unfamiliar with and to understand how the functions translate to actual things happening. I give it an example of functioning code and ask it to adapt the logic a certain way to see how it goes about it. I have to parse a large very old legacy codebase written in many parts by different people of different skill so just being able to understand what block does what is a big win some days. Even if its solutions aren’t copy/paste ready I usually learn quite a lot just seeing what insights it can gleam from the problem. Actually I prefer when I have to clean it up because it feels like I still did something to refine and sculpt the logic in a way the llm cant.

I don’t want to be a stereotypical ‘vibe coder’ who copies and paste without being able to bug fix or understand the code their putting in. So I ask plenty of questions and read through its reasoning for thousands of words to understand the thought processes that lead to functioning changes. I try my best to understand the code and clean it up. It is nice to have a second brain help with initial boiler plating and piecing together general flow of logic.

I treat it like a teacher and an editor. But its got limits like any tool and needs a sweet spot of context, example, and circumstance for it to work out okay,