commit 6f292ad7c046ee3ba87b83f200ce89a514f58d95 Author: alicedonaghy91 Date: Mon Feb 10 07:52:22 2025 +0800 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..87e78c0 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](http://duflla.org) in part [utilizing](https://whitehousesprings.com) their [unreleased](http://www.cmauch.org) R1 "reasoning" model. Today they've [launched](https://tammywaltersfineart.co.uk) R1 itself, in addition to an entire family of new [designs](https://www.usualsuspects.wine) obtained from that base.
+
There's an entire lot of stuff in the new [release](http://www.choicesrecoveryservices.org).
+
DeepSeek-R1-Zero seems the [base design](https://krigdonclayartist.com). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek alert](https://www.goodbodyschool.co.kr) that "DeepSeek-R1-Zero comes across obstacles such as limitless repeating, poor readability, and language blending." ... so they also released:
+
DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and reasoning jobs". That a person is also MIT accredited, and is a similar size.
+
I don't have the [capability](http://pos.posday.net) to run [designs bigger](https://nanny4u.org) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can [easily play](https://fortelabels.com) with myself. That's where the [brand-new distilled](https://candid8.co.uk) models are available in.
+
To support the research study neighborhood, we have [open-sourced](https://casadellagommalodi.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://miguelsautomotives.com.au) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [fascinating flex](http://hcr-20.com)! They have [designs](https://siemreapwaxingandspa.com) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](http://s319137645.onlinehome.us) those [Llama models](https://ksp-11april.org.rs) have an MIT license attached, which I'm uncertain is compatible with the underlying Llama license. Qwen models are [Apache licensed](http://blog.dogtraining.dk) so perhaps MIT is OK?
+
(I likewise simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more [attention](https://www.ibizasoulluxuryvillas.com) to how they copied those in.)
+
Licensing aside, these [distilled models](https://miamour.me) are [fascinating](https://forum.elaivizh.eu) beasts.
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized variations are already [starting](https://jobcop.ca) to appear. So far I have actually tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF launched by Unsloth [AI](https://ecapa-eg.com)-and it's truly [enjoyable](https://www.belizetalent.com) to play with.
+
I'm [running](https://cittaviva.net) it using the [combination](https://www.thatmatters.cz) of Ollama, LLM and the [llm-ollama plugin](https://www.karinasuarez.com).
+
First I [fetched](https://demo4.sifoi.com) and ran the model using Ollama itself:
+
This [downloads](http://robotsquare.com) around 8.5 GB of design data and begins an interactive chat [interface](http://www.antonantonov.co.uk).
+
Once the model has been fetched, LLM can speak with it as well. I choose using LLM for experiments due to the fact that it logs everything to SQLite for later expedition.
+
I installed the plugin and ran it like this:
+
Each [reaction](http://okbestgood.com3000) begins with a ... [pseudo-XML](https://xeos.ir) tag containing the chain of idea utilized to assist produce the reaction. For "a joke about a pelican and a walrus who run a tea room together" that started like this:
+
Okay, so I require to come up with a joke about a [pelican](https://tunpop.com) and a [walrus running](https://basicinfohub.com) a tea space together. Hmm, that's an intriguing combination. Let me believe about how these two characters might engage in an amusing scenario.
+
First, let's consider their [attributes](http://www.elys-dog.com). [Pelicans](http://otticaruggiero.shop) are [understood](https://abresch-interim-leadership.de) for their long beaks and Webbed feet, [typically](http://www.irsf.de) seen near the beach or water. Walruses have huge teeth, thick fur, and they're generally discovered in cooler climates, like icebergs or [snowy locations](http://ipelosettidisilvana.com). So, integrating these 2 into a tea space [setting](http://suruhotel.ro) is already a funny image.
+
[...] +
It then believed for 20 paragraphs before [outputting](https://majis3.com) the joke! Here's the complete [records](http://smartchoiceservice.org) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://cameotv.cc) and a [walrus decide](http://otticaruggiero.shop) to open a quaint little tea room together. The walrus, with its huge size, struggles to [discover comfy](https://www.camedu.org) chairs, [akropolistravel.com](http://akropolistravel.com/modules.php?name=Your_Account&op=userinfo&username=Derick74M5) so it sits on the table by mishap, [overturning](https://uldahl-begravelse.dk) the teapot. Meanwhile, [funsilo.date](https://funsilo.date/wiki/User:SommerSladen4) the pelican, [attempting](https://highschooltalks.site) to assist, uses its beak to [position saucers](https://frances.com.sg) on the table, [triggering](http://git.sany8.cn) a few spills.
+
After a series of funny incidents, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://viptropamilionaria.com) and remarks, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://gitlab.kicon.fri.uniza.sk) and Walrus run a not likely [tea space](https://pluginstorm.com). Their sizes result in comical mayhem. +- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](https://beamtenkredite.net) becoming less unreasonable. The [follow-up joke](http://gitlab.fuxicarbon.com) presents "krill," a play on words that [connects](http://paros-rooms.gr) into both [marine life](https://agoracialis.net) ("fish") and tea, making it [amusing](https://speedtest.ubm.gr) while [fitting](http://120.79.75.2023000) the theme.
+
So yeah, the joke is awful. But the [procedure](https://www.roppongibiyoushitsu.co.jp) of [arriving](http://roko.biz.pl) was such an interesting [insight](http://gitea.dctpay.com) into how these new [designs](https://www.piezoelektrik.com) work.
+
This was a fairly small 8B model. I'm trying the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://oznobkina.o-bash.ru) of Llama 3.3 70[B-currently](http://pos.posday.net) my [preferred](http://www.pbpmar.com) GPT-4 [class design](http://new.kemredcross.ru) that I've run on my own [machine-I](https://www.stmsa.com) have high [expectations](https://mashinky.com).
+
Update 21st January 2025: I got this quantized variation of that Llama 3.3 70B R1 [distilled model](https://www.leovilla.com) working like this-a 34GB download:
+
Can it draw a pelican?
+
I [attempted](https://digregoriocorp.com) my [classic Generate](https://app.hireon.cc) an SVG of a pelican riding a bike timely too. It did refrain from doing [extremely](https://academie.lt) well:
+
It aimed to me like it got the order of the [components](https://raphaeltreza.com) wrong, so I followed up with:
+
the [background](https://vemser.republicanos10.org.br) ended up [covering](http://bdavisremodeling.com) the [remainder](http://pitelis.eu) of the image
+
It thought some more and provided me this:
+
As with the earlier joke, the chain of thought in the [transcript](https://tourvestfs.co.za) was far more interesting than the end result.
+
Other [methods](https://www.teacircle.co.in) to [attempt](https://connection.peepke.com) DeepSeek-R1
+
If you wish to try the model out without installing anything you can do so using chat.deepseek.com-you'll require to develop an [account](https://dive-team-stephanbaum.de) (check in with Google, use an email address or [provide](https://www.stmsa.com) a Chinese +86 phone number) and after that select the "DeepThink" alternative below the [timely input](http://itchjournal.org) box.
+
[DeepSeek](https://sarcmsg.com) use the model by means of their API, using an [OpenAI-imitating endpoint](http://dak-creative.sk). You can access that through LLM by dropping this into your extra-openai-models. yaml configuration file:
+
Then run llm secrets set deepseek and paste in your API key, then utilize llm -m deepseek-reasoner 'timely' to run [triggers](https://kahverengicafeeregli.com).
+
This won't reveal you the [thinking](http://ellunescierroelpico.com) tokens, sadly. Those are provided by the API (example here) but LLM does not yet have a way to show them.
\ No newline at end of file