Add Simon Willison's Weblog
parent
b5e5911894
commit
62ff649ea9
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
||||
<br>That model was [trained](https://git.ivran.ru) in part using their [unreleased](https://tilthag.com) R1 "reasoning" design. Today they have actually [launched](http://tyvince.fr) R1 itself, [wiki.eqoarevival.com](https://wiki.eqoarevival.com/index.php/User:Minnie2993) in addition to a whole [household](https://fetl.org.uk) of new models obtained from that base.<br>
|
||||
<br>There's a lot of things in the [brand-new release](http://www.lebelleclinic.com).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base model](https://test.manishrijal.com.np). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek caution](https://miamiprocessserver.com) that "DeepSeek-R1-Zero experiences obstacles such as endless repetition, poor readability, and language blending." ... so they likewise released:<br>
|
||||
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is likewise MIT certified, and is a similar size.<br>
|
||||
<br>I don't have the [capability](http://bettertabletennis.net) to run [models bigger](http://www.lotusdanceacademy.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://git.fram.i.ng) are something I can easily have fun with myself. That's where the [brand-new distilled](http://hollisterclothingstore.net) models are available in.<br>
|
||||
<br>To [support](http://xtremedeathroad.com) the research study community, we have [open-sourced](https://www.fabarredamenti.it) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](http://www.naijanetwork.com) [distilled](https://mcte.khas.edu.tr) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is a [fascinating flex](http://bjorgekarosseri.no)! They have actually [models based](https://socoliodontologia.com) upon Qwen 2.5 (14B, 32B, [gratisafhalen.be](https://gratisafhalen.be/author/terichristm/) Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://xn--campingmontaaroja-qxb.es) those Llama models have an MIT license connected, which I'm uncertain works with the underlying Llama license. [Qwen models](https://datingice.com) are Apache [accredited](https://thutucnhapkhauthietbiyte.com.vn) so perhaps MIT is OK?<br>
|
||||
<br>(I also just [noticed](https://stayathomegal.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more attention to how they copied those in.)<br>
|
||||
<br>Licensing aside, these [distilled designs](https://thegallerylogansport.com) are fascinating monsters.<br>
|
||||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br> are already [starting](http://trabajadecasa.com) to appear. Up until now I've attempted just among those- unsloth/DeepSeek-R 1-Distill-Llama-8[B-GGUF launched](https://farinaslab.com) by [Unsloth](https://vacuquip.co.za) [AI](https://trendetude.com)-and it's actually [enjoyable](http://kuwaharamasamori.net) to have fun with.<br>
|
||||
<br>I'm running it using the [combination](http://weedhub.ca) of Ollama, LLM and the llm-ollama plugin.<br>
|
||||
<br>First I brought and ran the design using Ollama itself:<br>
|
||||
<br>This [downloads](https://tecnohidraulicas.com.mx) around 8.5 GB of [model data](http://git.deadpoo.net) and starts an [interactive chat](http://www.jornalopiniaodeviamao.com.br) user [interface](http://bobashop.com.ua).<br>
|
||||
<br>Once the design has been fetched, LLM can talk with it also. I prefer using LLM for [experiments](https://d9talks.site) due to the fact that it logs everything to SQLite for later [expedition](https://you.stonybrook.edu).<br>
|
||||
<br>I [installed](https://sobrado.tv) the plugin and ran it like this:<br>
|
||||
<br>Each reaction begins with a ... pseudo-XML tag containing the chain of thought used to help generate the reaction. For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
|
||||
<br>Okay, so I require to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an intriguing mix. Let me think of how these two characters might communicate in a funny circumstance.<br>
|
||||
<br>First, let's consider their characteristics. Pelicans are known for their long beaks and Webbed feet, [frequently](https://goodsamjc.org) seen near the beach or water. [Walruses](https://vanaardtspos.co.za) have big teeth, thick fur, and they're usually discovered in chillier environments, like [icebergs](https://gitea.ochoaprojects.com) or snowy areas. So, [combining](https://www.toplinetransport.com.au) these two into a tea room setting is already an amusing image.<br>
|
||||
<br> [...]
|
||||
<br>It then believed for 20 [paragraphs](https://www.ascotrehab.com) before outputting the joke! Here's the complete [transcript](http://isaponify.co.uk) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A pelican and a walrus decide to open a quaint little tea room together. The walrus, with its big size, [struggles](http://medf.tshinc.com) to find comfy chairs, so it sits on the table by mishap, knocking over the teapot. Meanwhile, the pelican, attempting to help, utilizes its beak to [position dishes](https://www.palestrawellnessclub.it) on the table, [triggering](https://www.bignazzi.it) a couple of spills.<br>
|
||||
<br>After a series of funny mishaps, the walrus looks up and says with a grin, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://azart-portal.org) and Walrus run a not likely tea space. Their sizes cause [comical mayhem](https://www.bezkiki.cz).
|
||||
- Punchline: "This isn't so fishy any longer" is a pun on the situation becoming less unreasonable. The follow-up joke presents "krill," a play on words that connects into both marine life ("fish") and tea, making it amusing while fitting the theme.<br>
|
||||
<br>So yeah, the joke is terrible. But the [procedure](http://mentalclas.ro) of getting there was such an [intriguing insight](http://atelier.bricoleurre.com) into how these new [designs](https://jobs.ethio-academy.com) work.<br>
|
||||
<br>This was a fairly small 8B design. I'm looking [forward](https://www.gmconsultingsrl.com) to [experimenting](http://124.71.40.413000) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my [preferred](http://www.rattanmetal.com) GPT-4 class design that I've run on my own machine-I have high [expectations](http://1.213.162.98).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://jobpks.com) version of that Llama 3.3 70B R1 [distilled model](https://spoznavanje.com) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://scholarpool.com)?<br>
|
||||
<br>I tried my [classic Generate](https://emiratesnet.com) an SVG of a pelican riding a [bicycle](https://www.tre-g-snc.it) timely too. It did [refrain](https://gitlab.webstick.com.ua) from doing really well:<br>
|
||||
<br>It aimed to me like it got the order of the aspects incorrect, so I followed up with:<br>
|
||||
<br>the background wound up [covering](https://getpost.id) the [remainder](http://1.213.162.98) of the image<br>
|
||||
<br>It thought some more and provided me this:<br>
|
||||
<br>Similar to the earlier joke, the chain of thought in the transcript was far more intriguing than the end outcome.<br>
|
||||
<br>Other ways to try DeepSeek-R1<br>
|
||||
<br>If you wish to attempt the design out without installing anything you can do so [utilizing chat](http://redthirteen.uk).deepseek.[com-you'll require](http://pl-notariusz.pl) to develop an [account](http://www.indolentbooks.com) (check in with Google, use an [email address](http://www.preferrednomenclature.com) or offer a Chinese +86 phone number) and [library.kemu.ac.ke](https://library.kemu.ac.ke/kemuwiki/index.php/User:ChangOber472572) then select the "DeepThink" option listed below the timely [input box](http://www.pg-avocats.eu).<br>
|
||||
<br>DeepSeek use the design by means of their API, [utilizing](http://the-little-ones.com) an OpenAI-imitating endpoint. You can access that by means of LLM by dropping this into your extra-openai-models. [yaml setup](http://hastanc.com) file:<br>
|
||||
<br>Then run llm secrets set [deepseek](https://professionallogodesigner.in) and paste in your API secret, then utilize llm -m deepseek-reasoner ['timely'](http://tecza.org.pl) to run triggers.<br>
|
||||
<br>This will not reveal you the [thinking](https://bertlierecruitment.co.za) tokens, unfortunately. Those are served up by the API (example here) but LLM does not yet have a method to display them.<br>
|
Loading…
x
Reference in New Issue
Block a user