Add Run DeepSeek R1 Locally - with all 671 Billion Parameters

2025-02-10 14:43:10 +08:00 · 2025-02-10 14:43:10 +08:00 · 56638ee265
commit 56638ee265
parent 7ad255f5f2
1 changed files with 67 additions and 0 deletions
--- a/Parameters.-.md
+++ b/Parameters.-.md
@ -0,0 +1,67 @@
+<br>Recently, I showed how to quickly run distilled variations of the DeepSeek R1 design in your area. A distilled model is a compressed version of a larger language design, where understanding from a bigger model is moved to a smaller sized one to decrease resource use without losing too much efficiency. These models are based on the Llama and Qwen architectures and be available in variants [varying](https://www.fightdynasty.com) from 1.5 to 70 billion [criteria](https://www.drawlfest.com).<br>
+<br>Some explained that this is not the REAL DeepSeek R1 which it is difficult to run the complete design in your area without several hundred GB of memory. That sounded like a challenge - I believed! First Attempt - Heating Up with a 1.58 bit Quantized Version of DeepSeek R1 671b in Ollama.cpp<br>
+<br>The designers behind Unsloth dynamically [quantized](http://wp.sos-foto.de) [DeepSeek](http://pedrodesaa.com) R1 so that it could [operate](https://citrusdallodge.co.za) on just 130GB while still [gaining](https://pakishaliyikama.com) from all 671 billion specifications.<br>
+<br>A quantized LLM is a LLM whose specifications are kept in [lower-precision formats](http://www.tir-de-mine.eu) (e.g., 8-bit or 4-bit rather of 16-bit). This considerably minimizes memory usage and accelerates processing, with very little effect on efficiency. The full variation of DeepSeek R1 [utilizes](https://www.flashfxp.com) 16 bit.<br>
+<br>The compromise in precision is hopefully compensated by increased speed.<br>
+<br>I downloaded the files from this collection on Hugging Face and ran the following command with Llama.cpp.<br>
+<br>The following table from Unsloth reveals the advised worth for the [n-gpu-layers](https://goldengrouprealestate.com) specification, which suggests just how much work can be unloaded to the GPU.<br>
+<br>According to the table, I thought 7 ought to be the maximum, however I got it running with 12. According to Windows Task [Manager](https://mostrasescdecinemarj.com.br) my GPU has 40 GB of memory, and not 24 as I believed. So then it adds up (7/ 24 * 40 ≈ 12).<br>
+<br>If you prefer to run the design straight with Ollama, you need to merge the 3 GGUF files utilizing Llama.cpp or a similar tool first.<br>
+<br>Above is some of the text that [appears](https://davidcarruthers.co.uk) on screen while running the command.<br>
+<br>Llama.cpp and the [quantized design](https://banery-lezajsk.pl) are heavy on memory and CPU, however likewise use the GPU (a little) and continually check out from the disk where the [model files](https://www.sfogliata.com) are kept.<br>
+<br>The model duplicates the [concern](https://www.euro-cash.it) and is preparing to begin reasoning.<br>
+<br>The thinking, or thinking, is rather prolonged - which also [contributes](https://link-to-chablais.fr) to the total time it takes.<br>
+<br>Okay, so I require to find out Alan Turing's crucial contribution to how we live our lives in 2025. Hmm. Let's begin by remembering who Alan Turing was. He was a British mathematician and computer scientist, right? He's known for his operate in code-breaking throughout World War II, especially with the Enigma machine. Oh, and he's likewise considered among the fathers of theoretical computer technology and expert system.<br>
+<br>So, his contributions ... Well, the Turing Machine principle is essential in computer [science](https://www.razr-inc.com). It's a theoretical design of a computer system, which laid the foundation for modern computing. Then there's the Turing Test, which is about identifying whether a machine can show smart behavior identical from a human. That relates to [AI](https://www.kanzlei-reisch.eu). Also, his deal with morphogenesis, which I believe has to do with biological patterns and growth, like how organisms develop shapes and patterns.<br>
+<br>Now, how do these [contributions affect](https://www.graham-reilly.com) life in 2025? Well, we remain in a world where [AI](https://nukestuff.co.uk) is ending up being more common. So maybe his deal with the Turing Test is affecting how [AI](https://historycomics.edublogs.org) is established and integrated into society. If the Turing Test is a benchmark for [AI](http://kuehler-henke.de) intelligence, then improvements in [AI](http://albert2016.ru) that pass this test might be substantial. But wait, the Turing Test is more of a philosophical benchmark. Maybe the actual algorithms or principles he established are more foundational.<br>
+<br>The Turing Machine model is the basis for the  computer system, which is the architecture of modern computers. All our computer systems today, consisting of mobile phones, servers, etc, are based on this idea. So, in 2025, even as technology advances, the essential principles he laid down would still be the basis for all computational gadgets. That's quite big. Without that, we would not have the computers that power whatever from the web to [AI](https://eugo.ro) algorithms.<br>
+<br>Additionally, his deal with code-breaking throughout WWII resulted in improvements in cryptography. Modern file encryption approaches might have roots in his work, which is essential for safe communication in 2025. Everything from online banking to [private](https://www.autopartz.com) messaging relies on encryption, so that's a big contribution.<br>
+<br>Another angle is his impact on the advancement of algorithms. The idea of a universal computing machine permits the creation of [complicated algorithms](http://seigneurdirige.unblog.fr) that drive [AI](https://vencaniceanastazija.com), information analysis, artificial intelligence, etc. So, any [AI](https://worship.com.ng) systems in 2025 would stem from his fundamental work.<br>
+<br>Also, morphogenesis may have influenced fields like biology and [genetics](https://erhvervsbil.nu). Maybe in 2025, comprehending biological patterns assists in areas like genetic modification or artificial biology. But I'm uncertain how straight that ties into daily life compared to computing.<br>
+<br>Putting everything together, the most important contribution might be the Turing Machine and the theory of computation, which underpin all contemporary computing technology. Without that, the digital age would not exist as we understand it. In 2025, even with improvements, the basic structure of computation he offered would still be essential. Alternatively, his deal with [AI](http://coral-sendai.jp) and the Turing Test might be more [highlighted](http://moon.gandme.co.kr) in an [AI](https://starttrainingfirstaid.com.au)-driven future, however the fundamental computing theory seems more fundamental.<br>
+<br>Wait, however the [Turing Test](https://mymedicalbox.net) is more about examining [AI](http://dreamlifefreedom.com) rather than a contribution to technology itself. The genuine contribution is the theoretical basis for computing, which allowed all the digital tech we have. So, in 2025, all the clever gadgets, [AI](https://www.kino-ussr.ru), cloud computing, and so on, are all built upon the principles he established. Therefore, his [crucial contribution](https://vencaniceanastazija.com) is the principle of a universal Turing maker, which is the basis for all modern-day computers, leading to the digital revolution that specifies modern life.<br>
+<br>And finally the answer.<br>
+<br>Alan Turing's most considerable contribution to how we live in 2025 is his foundational operate in theoretical computer technology, especially the concept of the universal Turing maker. This theoretical model established the concepts of modern calculation, forming the basis for all digital computer systems. In 2025, this innovation underpins the whole digital world, enabling technologies like synthetic intelligence, cloud computing, mobile phones, and the web. Without Turing's conceptual structure, the development of programmable computers and subsequent innovations that define modern [life-from](https://miakhalifa.nl) [AI](https://www.yoga4love.com) algorithms to [secure communication](https://24sintfrans.be) systems-would not have been possible. His deal with cryptography and the Turing Test also contributed to protect communications and [AI](https://ohioaccurateservice.com) advancement, but it is the universal Turing device that remains the foundation of our digital presence.<br>
+<br>In summary: The 1.58 bit quantized design produced 0.39 tokens per second. In total, it took about 37 minutes to answer the same question.<br>
+<br>I was kind of surprised that I had the ability to run the model with only 32GB of RAM.<br>
+<br>Second Attempt - DeepSeek R1 671b in Ollama<br>
+<br>Ok, I get it, a quantized model of only 130GB isn't really the full design. Ollama's model library seem to include a complete version of [DeepSeek](http://bentonchurch.com) R1. It's 404GB with all 671 billion specifications - that should be real enough, right?<br>
+<br>No, not truly! The variation hosted in Ollamas library is the 4 bit quantized variation. See Q4_K_M in the [screenshot](https://hylpress.net) above? It took me a while!<br>
+<br>With Ollama installed on my home PC, I just needed to clear 404GB of disk space and run the following [command](http://125.43.68.2263001) while [grabbing](https://stoopvandeputte.be) a cup of coffee:<br>
+<br>Okay, it took more than one coffee before the download was complete.<br>
+<br>But finally, the download was done, and the excitement grew ... until this message appeared!<br>
+<br>After a quick see to an online shop [selling numerous](https://staging2020.stowetrails.org) kinds of memory, I concluded that my motherboard would not [support](http://tola-czechowska.com) such big quantities of RAM anyway. But there must be options?<br>
+<br>Windows enables virtual memory, [suggesting](https://cntrc.org) you can [switch disk](http://jorjournal.com) space for virtual (and rather sluggish) memory. I [figured](https://www.eastrockproperties.com) 450GB of additional virtual memory, in addition to my 32GB of genuine RAM, should be adequate.<br>
+<br>Note: Be mindful that SSDs have a minimal variety of write operations per memory cell before they break. Avoid excessive usage of [virtual memory](http://furuhonfukuoka.info) if this concerns you.<br>
+<br>A new attempt, and increasing enjoyment ... before another error message!<br>
+<br>This time, Ollama tried to push more of the Chinese language model into the [GPU's memory](https://git.mitsea.com) than it could manage. After browsing online, it seems this is a known problem, however the service is to let the GPU rest and let the CPU do all the work.<br>
+<br>Ollama uses a "Modelfile" containing [configuration](https://job.da-terascibers.id) for the design and how it ought to be used. When utilizing designs straight from Ollama's model library, you generally do not deal with these files as you need to when downloading designs from Hugging Face or similar sources.<br>
+<br>I ran the following command to display the existing setup for DeepSeek R1:<br>
+<br>Then, I added the following line to the output and waited in a brand-new file named Modelfile:<br>
+<br>I then produced a new design setup with the following command, where the last [parameter](http://sosnovybor-ykt.ru) is my name for the design, which now runs completely without GPU use:<br>
+<br>Once again, the enjoyment grew as I [nervously](https://git.mitsea.com) typed the following command:<br>
+<br>Suddenly, it took place! No mistake message, but tokens began [appearing](https://daisydesign.net)! Admittedly, the speed was such that if you were fast enough, you could nearly grab a coffee in between each screen update - however it was working! A language design on par with the very best from OpenAI, with all its 671 billion parameters, operating on my three-year-old PC with 32GB (real) RAM!<br>
+<br>I had asked the same question to both ChatGPT (4o, o1, 03-mini-high), DeepSeek R1 hosted in China and DeepSeek R1 671b hosted locally.<br>
+<br>This is the reasoning from DeepSeek R1 running in Ollama on my own computer, as it appeared slowly-token by token.<br>
+<br>And lastly this is the response from DeepSeek R1 [running](https://www.stackdeveloping.com) in Ollama. The design outputs the answer with [markdown formatting](https://www.bevattningsteknik.se).<br>
+<br>Alan Turing's most considerable contribution to contemporary life, especially as we look towards 2025, lies in his foundational work on [computational theory](https://www.placelikehomemusic.com) and expert system, which underpin almost every element of contemporary technology. Here's a structured breakdown:<br>
+<br>1. Universal Turing [Machine](https://circuloamistad.com) (1936 ):<br>
+<br>- This conceptual design for general-purpose computing is the [bedrock](http://42.194.159.649981) of all digital innovation today. It [established](https://twittx.live) that a single maker could carry out any computable job with the ideal program, making it possible for: [Modern Computing](http://barbarafavaro.com) Devices (smart devices, cloud infrastructure, IoT).
+Software Development,  [wiki.asexuality.org](https://wiki.asexuality.org/w/index.php?title=User_talk:WinifredNgo8072) permitting versatile applications throughout markets.<br>
+<br>2. In 2025, improvements in quantum computing, [AI](https://kabanovskajsosh.minobr63.ru), and edge computing still count on Turing's concepts of programmable reasoning.<br>
+<br>3. Expert system & the Turing Test (1950 ):<br>
+<br>- His structure for evaluating device intelligence stimulated years of research into human-like [AI](https://misslady.it). By 2025, this tradition appears in: - Advanced chatbots, virtual assistants (e.g., GPT-4/ 5), and autonomous systems.
+- Ethical debates around [AI](https://zebra.pk) decision-making in healthcare, finance, and law.<br>
+<br>4. Cryptography & Security:<br>
+<br>- Turing's WWII codebreaking work foreshadowed contemporary encryption methods critical for: - Securing digital deals, [individual](https://www.gasthaus-altepost.ro) data, and blockchain innovations.
+- Combating cyber risks in a progressively linked world.<br>
+<br>Why the Universal Machine Stands Out: While [AI](https://networkengineeracademy.com) and cryptography are transformative, [Turing's theoretical](http://refatrack.com) model of [computation](http://coral-sendai.jp) is the most fundamental. It enabled the development of [programmable systems](http://123.60.19.2038088) that drive today's innovations-from [AI](http://www.xiangtoushu.com) algorithms to quantum computing research study. Without this foundation, the digital facilities of 2025 just would not exist.<br>
+<br>So, the length of time did it take, utilizing the 4 bit quantized model? Quite a while! At 0.05 tokens per 2nd [- indicating](https://www.sophisticatedfloralsbystephanie.com) 20 seconds per token - it took nearly 7 hours to get a response to my question, [including](https://newtew.com) 35 minutes to load the design.<br>
+<br>While the design was thinking, the CPU, memory, and the disk (utilized as [virtual](http://phenix-hk.com) memory) were close to 100% busy. The disk where the model file was conserved was not hectic during generation of the action.<br>
+<br>After some reflection, I thought possibly it's alright to wait a bit? Maybe we should not ask [language designs](https://europlus.us) about whatever all the time? Perhaps we ought to believe for ourselves first and be prepared to wait for a response.<br>
+<br>This might look like how computer systems were used in the 1960s when machines were big and [availability](https://citrusdallodge.co.za) was really [restricted](http://git.qwerin.cz). You [prepared](https://mptradio.com) your [program](http://zurnadzhi.ru) on a stack of punch cards, which an operator packed into the device when it was your turn, and you might (if you were lucky) select up the outcome the next day - unless there was a mistake in your program.<br>
+<br>Compared to the response from other LLMs with and without reasoning<br>
+<br>DeepSeek R1, hosted in China, believes for 27 seconds before providing this response, which is a little much shorter than my in your area hosted DeepSeek R1's response.<br>
+<br>ChatGPT [answers](https://vencaniceanastazija.com) likewise to [DeepSeek](http://foundationhkpltw.charities-nft.com) but in a much shorter format, with each design providing a little different [actions](http://182.92.251.553000). The reasoning models from OpenAI invest less time reasoning than DeepSeek.<br>
+<br>That's it - it's certainly possible to run various quantized variations of DeepSeek R1 in your area, with all 671 billion specifications - on a 3 years of age computer with 32GB of RAM - simply as long as you're not in [excessive](https://be.citigatedewerogerson.com) of a rush!<br>
+<br>If you actually desire the full, non-quantized version of DeepSeek R1 you can find it at Hugging Face. Please let me know your tokens/s (or rather seconds/token) or you get it running!<br>