Posted this elsewhere(Reddit), feel free to get whatever other models you want. Fuck AI as a service, don’t need my data going elsewhere. Local LLMs are the the future.
The first place is to start here: https://ollama.com/download
Once installed you can use command line immediately in some sort of terminal (ie. Powershell in windows) and type a command ollama run Qwen 2.5:3b and it will download the model and run it.
Each model has a card that is on their site that discusses download size. Unless you have an Nvidia card with lots of RAM you probably want to keep under 5GB (larger == slower).
Here’s the card for Qwen2.5:3b: https://ollama.com/library/qwen2.5:3b
Here’s the source code for Ollama with more information on how to use it. https://github.com/ollama/ollama
Then you can add UI on top of it, WebUI (python self contained package available in install directions). This is good because you can have multiple accounts and share it over your LAN to use on smart devices or other hardware without dedicated GPU https://github.com/open-webui/open-webui (video for how to setup on wsl/Linux https://youtu.be/Wjrdr0NU4Sk)
Or for easier running locally you can try LM studio. I’m ot sure if you can serve over LAN, but it’s easier https://lmstudio.ai/
If you want to get fancy to generate images and video, id recommend. ComfyUI https://github.com/comfyanonymous/ComfyUI you can use stable-diffusion https://huggingface.co/stabilityai/stable-diffusion-3.5-medium or for videos id recommend LTX-Video https://huggingface.co/Lightricks/LTX-Video
The gold standard is HunyuanVideo, but good luck running it unless you have like 4x RTX 3090 https://huggingface.co/tencent/HunyuanVideo