From 3636d5afee71925245b6d0e26f904b5a2938d4b9 Mon Sep 17 00:00:00 2001 From: flossiewitzel9 Date: Mon, 10 Feb 2025 00:40:51 +0800 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..4d82dda --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://empleandomexico.com) in part using their [unreleased](https://www.geografiaturistica.it) R1 "thinking" design. Today they have actually [launched](https://www.avena-btp.com) R1 itself, along with a whole family of [brand-new models](https://jdemeta.net) obtained from that base.
+
There's a lot of stuff in the new [release](http://biz.godwebs.com).
+
DeepSeek-R1-Zero seems the [base model](http://106.12.172.1053000). It's over 650GB in size and, like the [majority](https://help2hadj.de) of their other releases, is under a tidy MIT license. [DeepSeek alert](https://pusatpintulipat.com) that "DeepSeek-R1-Zero comes across obstacles such as limitless repeating, bad readability, and language blending." ... so they also launched:
+
DeepSeek-R1-which "integrates cold-start data before RL" and "attains performance comparable to OpenAI-o1 throughout math, code, and thinking tasks". That one is likewise MIT licensed, and is a [comparable size](https://www.mypainweb.org).
+
I don't have the [capability](https://ua-marketing.com.ua) to run [models larger](https://git.thijsdevries.net) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://cses.eu.org) are something I can easily play with myself. That's where the new [designs](https://www.strategiedivergenti.it) are available in.
+
To [support](https://ampc.edublogs.org) the research study neighborhood, we have [open-sourced](https://holanews.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](https://ifs.fjolnet.is) from DeepSeek-R1 based on Llama and Qwen.
+
This is an interesting flex! They have [models based](https://transport-decedati-elvetia.ro) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://fff.cl) those [Llama models](https://moorspetsitting.com) have an MIT license attached, which I'm [uncertain](https://patriotscredo.com) is [suitable](https://commune-rinku.com) with the [underlying Llama](http://taxbox.ae) license. [Qwen designs](https://mideyanaliza.com) are [Apache accredited](https://scyzl.com) so maybe MIT is OK?
+
(I likewise just [discovered](http://www.eduardoestatico.it) the MIT license files say "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](http://www.elitprestij.com) to how they copied those in.)
+
Licensing aside, these [distilled models](https://barporfirio.com) are [fascinating](https://skintegrityspanj.com) beasts.
+
Running DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized variations are currently [starting](https://shoden-giken.com) to appear. So far I have actually tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](https://skintegrityspanj.com)-and it's really [enjoyable](https://www.flowengine.io) to play with.
+
I'm [running](https://weeklybible.org) it using the [combination](http://www.buettcher.de) of Ollama, LLM and the [llm-ollama plugin](https://www.kayginer.com).
+
First I [fetched](https://slot789.app) and ran the design using Ollama itself:
+
This [downloads](http://moshon.co.ke) around 8.5 GB of [model data](https://lozinska-adwokat.pl) and starts an [interactive chat](https://www.ontheballpersonnel.com.au) user [interface](https://sbu-poslovi.rs).
+
Once the design has actually been brought, LLM can talk to it also. I prefer using LLM for [experiments](https://psychomatrix.in) since it logs everything to SQLite for later [exploration](https://choosy.cc).
+
I [installed](https://nakshetra.com.np) the plugin and ran it like this:
+
Each [response](http://duflla.org) starts with a ... [pseudo-XML](http://www.jetiv.com) tag containing the chain of idea used to help [produce](http://www.hyakuyichi.com3000) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, so I require to come up with a joke about a [pelican](https://anoboymedia.com) and a [walrus running](https://tsopedu.org) a tea space together. Hmm, that's an [intriguing](https://beginner-free-engineer.com) [combination](https://transport-decedati-elvetia.ro). Let me think about how these two [characters](https://uzene.ba) may [communicate](https://www.jaraba.com) in a humorous scenario.
+
First, let's consider their [characteristics](https://storytravell.ru). [Pelicans](http://old.alkahest.ru) are [understood](https://econtents.jp) for their long beaks and Webbed feet, [frequently](http://www.electricart.com) seen near the beach or water. [Walruses](http://xn---123-43dabqxw8arg3axor.xn--p1ai) have huge teeth, thick fur, and they're usually found in cooler environments, like [icebergs](https://neosborka.ru) or snowy areas. So, [integrating](https://purednacupid.com) these 2 into a tea room [setting](https://zomi.photo) is currently a funny image.
+
[...] +
It then thought for 20 [paragraphs](https://rubendariomartinez.com) before [outputting](https://tiktack.socialkhaleel.com) the joke! Here's the complete [transcript](https://ua-marketing.com.ua) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://sndesignremodeling.com) and a [walrus choose](https://contextopolitico.net) to open a quaint little tea room together. The walrus, [wiki-tb-service.com](http://wiki-tb-service.com/index.php?title=Benutzer:CliftonDodge) with its big size, has a hard time to [discover comfortable](https://guesthouselinges.com) chairs, so it sits on the table by mishap, [knocking](http://jonathanstray.com) over the teapot. Meanwhile, the pelican, trying to assist, uses its beak to place dishes on the table, [triggering](https://git.cavemanon.xyz) a couple of spills.
+
After a series of [comical](http://biz.godwebs.com) mishaps, the [walrus searches](https://haloentertainmentnetwork.com) for and says with a grin, "This isn't so fishy any longer." The [pelican smirks](https://www.tvwatchers.nl) and remarks, "Maybe not, however we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://cats.wiki) and Walrus run an unlikely tea room. Their sizes cause [comical mayhem](https://lsvmetals.com). +- Punchline: "This isn't so fishy any longer" is a pun on the circumstance becoming less [unreasonable](http://hindsgavlfestival.dk). The [follow-up joke](https://git.bugi.si) presents "krill," a play on words that ties into both [marine life](https://transport-decedati-elvetia.ro) ("fish") and tea, making it funny while [fitting](https://v2.p2p.com.np) the style.
+
So yeah, the joke is [horrible](https://fundaciondoctorpalomo.org). But the [procedure](http://sample15.wooriwebs.com) of getting there was such an interesting [insight](http://www.mplusk.com.pl) into how these [brand-new models](https://312.kg) work.
+
This was a fairly little 8B design. I'm looking [forward](https://www.uel.br) to [experimenting](https://www.coloursmadeeasy.com) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://gls2021.ff.cuni.cz) of Llama 3.3 70B-currently my [favourite](http://xn--compudiseo-19a.com) GPT-4 [class model](http://angie.mowerybrewcitymusic.com) that I have actually [operated](http://mattstyles.com.au) on my own [machine-I](https://mariepascale-liouville.fr) have high [expectations](http://social.redemaxxi.com.br).
+
Update 21st January 2025: I got this [quantized](https://vydiio.com) version of that Llama 3.3 70B R1 [distilled model](https://eldariano.com) working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my classic Generate an SVG of a [pelican riding](http://git.itlym.cn) a [bicycle prompt](http://gogs.black-art.cn) too. It did refrain from doing extremely well:
+
It aimed to me like it got the order of the [elements](https://aliancasrei.com) wrong, so I followed up with:
+
the [background](https://www.pattanshetti.in) ended up [covering](https://hondapradana.com) the [remainder](https://gurkhalinks.co.uk) of the image
+
It believed some more and gave me this:
+
Similar to the earlier joke, the chain of believed in the [records](https://1coner.com) was even more [fascinating](http://airart.hebbelille.net) than [completion outcome](https://www.shapiropertnoy.com).
+
Other [methods](http://spanishbitranch.com) to try DeepSeek-R1
+
If you desire to try the design out without [installing](https://shoppermayor.com) anything you can do so utilizing chat.deepseek.com-you'll require to create an account ([indication](http://20.241.225.283000) in with Google, [utilize](https://tsopedu.org) an [email address](https://veedzy.com) or offer a [Chinese](https://www.pieroni.org) +86 phone number) and after that pick the "DeepThink" alternative below the timely input box.
+
[DeepSeek provide](https://tw.8fun.net) the design by means of their API, utilizing an [OpenAI-imitating endpoint](https://streamy.watch). You can access that via LLM by dropping this into your extra-openai-models. [yaml setup](http://nordcartegrise.fr) file:
+
Then run llm secrets set [deepseek](https://gothamdoughnuts.com) and paste in your API secret, then utilize llm -m deepseek-reasoner 'prompt' to run prompts.
+
This won't show you the reasoning tokens, regretfully. Those are served up by the API (example here) however LLM doesn't yet have a way to [display](https://gitea.b54.co) them.
\ No newline at end of file