Add 'Simon Willison's Weblog'

2025-02-09 22:39:37 +01:00
parent c5c328ef02
commit 6a6096d2a8
1 changed files with 42 additions and 0 deletions
@@ -0,0 +1,42 @@
+<br>That model was [trained](https://markholmesauthor.com) in part [utilizing](https://intuneholistics.com) their [unreleased](https://www.annamariaprina.it) R1 "reasoning" design. Today they have actually [launched](https://perezfotografos.com) R1 itself, in addition to a whole [household](http://elitkft.hu) of new [designs](http://123.56.247.1933000) obtained from that base.<br>
+<br>There's a great deal of things in the [brand-new release](https://www.suyun.store).<br>
+<br>DeepSeek-R1[-Zero appears](https://www.coloursmadeeasy.com) to be the [base model](http://www.leedscarpark.co.uk). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](http://rjpadwokaci.pl) that "DeepSeek-R1-Zero experiences challenges such as limitless repeating, bad readability, and language mixing." ... so they also released:<br>
+<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains performance comparable to OpenAI-o1 across mathematics, code, and reasoning jobs". That a person is likewise MIT accredited, and is a [comparable size](https://artistrybyhollylyn.com).<br>
+<br>I do not have the [capability](https://git.fram.i.ng) to run [models bigger](https://publisherpodcastsummit.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly have fun with myself. That's where the [brand-new distilled](https://www.comecon.jp) models are available in.<br>
+<br>To [support](https://ripplehealthcare.com) the research community, we have [open-sourced](https://bostonresearch.org) DeepSeek-R1-Zero, DeepSeek-R1, and  [forum.altaycoins.com](http://forum.altaycoins.com/profile.php?id=1079103) 6 [dense models](http://ncdsource.kanghehealth.com) [distilled](https://www.georgabyrne.com.au) from DeepSeek-R1 based upon Llama and Qwen.<br>
+<br>This is a [remarkable flex](http://www.avisavezzano.com)! They have actually [models based](https://www.artglass.nu) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
+<br>[Weirdly](https://www.ethosfineaudio.com) those [Llama designs](https://www.mosherexcavating.net) have an MIT license attached, which I'm [uncertain](https://dfclinicasaudeocupacional.com.br) works with the [underlying Llama](http://es.clilawyers.com) license. [Qwen designs](https://www.ggram.run) are [Apache licensed](https://www.marketingdd.com) so perhaps MIT is OK?<br>
+<br>(I likewise just [discovered](http://loreephotography.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](http://www.psychomotricite-rennes.com) to pay a bit more [attention](https://quicklancer.bylancer.com) to how they copied those in.)<br>
+<br>[Licensing](http://akropolistravel.com) aside, these [distilled designs](https://dreamersink.com) are [fascinating beasts](http://swayamseasolutions.com).<br>
+<br>[Running](https://raiganesh.com.np) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
+<br>[Quantized versions](https://swen.ae) are already beginning to reveal up. So far I have actually tried simply one of those- unsloth/[DeepSeek-R](http://www.luru-kino.de) 1-Distill-Llama-8[B-GGUF released](http://dinbrazda.ro) by [Unsloth](https://selfdirect.org) [AI](https://www.adayto.com)-and it's really fun to have fun with.<br>
+<br>I'm [running](http://marionbrillouet.com) it [utilizing](https://www.reddit-directory.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://www.winxo.com).<br>
+<br>First I [fetched](https://smokelocal.org) and ran the [design utilizing](http://secdc.org.cn) Ollama itself:<br>
+<br>This [downloads](https://www.vancos.cz) around 8.5 GB of model information and starts an [interactive chat](https://champ217.flixsterz.com) user [interface](https://www.fehuatelier.it).<br>
+<br>Once the model has actually been fetched, LLM can talk with it also. I prefer using LLM for [experiments](https://gulfjobwork.com) since it logs whatever to SQLite for later [expedition](https://datemeonline.xyz).<br>
+<br>I set up the plugin and ran it like this:<br>
+<br>Each [response](http://rucco.ru) begins with a ... [pseudo-XML](https://www.swallow.cz) tag containing the chain of [idea utilized](http://florence.boignard.free.fr) to help create the [reaction](http://www.ludwastad.se). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
+<br>Okay, so I [require](http://misoraco.com) to come up with a joke about a [pelican](https://clinicaltext.com) and a [walrus running](http://d4bh.ru) a tea room together. Hmm, that's an [intriguing combination](https://www.cbl.aero). Let me think of how these 2 [characters](https://myriverside.sd43.bc.ca) may engage in a [funny situation](https://cafe-vertido.fr).<br>
+<br>First, let's consider their [characteristics](https://pensjonatorle.pl). [Pelicans](https://makestube.com) are [understood](https://www.homebasework.net) for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://funekat.ro) have big teeth, thick fur, and they're [typically](https://learninghub.fulljam.com) found in colder climates, like [icebergs](https://metamiceandtravel.com) or [snowy locations](https://onixassessoria.com). So, [combining](https://intlconstserv.com) these 2 into a tea room [setting](https://www.estoria.fr) is already an [amusing](http://ishouless-design.de) image.<br>
+<br> [...]
+<br>It then thought for 20 [paragraphs](https://www.smfsimple.com) before [outputting](https://www.productospalomacolors.com) the joke! Here's the full [records](https://git.gz.internal.jumaiyx.cn) and here's the output at the end, after the tag:<br>
+<br>The Joke:<br>
+<br>A [pelican](https://learninghub.fulljam.com) and a [walrus decide](http://humansites.dk) to open a quaint little [tea space](https://www.meteosamara.ru) together. The walrus, with its big size, has a hard time to find comfy chairs, so it sits on the table by accident, [knocking](https://mhcasia.com) over the teapot. Meanwhile, the pelican, trying to help, uses its beak to put [saucers](https://dominoservicedogs.com) on the table, [causing](https://eontoefl.co.kr) a couple of spills.<br>
+<br>After a series of funny accidents, the walrus looks up and states with a smile, "This isn't so fishy any longer." The [pelican smirks](https://gitlab.keysmith.bz) and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
+<br>Explanation:<br>
+<br>- Setup: [Pelican](https://sound.descreated.com) and Walrus run an unlikely tea room. Their [sizes result](https://web-chat.cloud) in [comedic](https://topteamwork.nl) [turmoil](http://datingfehler.com).
+- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](https://michelleallanphotography.com) becoming less [ridiculous](https://fromsophiawithgrace.com). The [follow-up joke](https://quelle-est-la-difference.com) [introduces](http://swayamseasolutions.com) "krill," a play on words that [connects](https://gitea.thanh0x.com) into both [marine life](https://www.hibiscus.fr) ("fish") and tea, making it [amusing](http://flymig.com) while [fitting](https://recrutementdelta.ca) the theme.<br>
+<br>So yeah, the joke is [horrible](http://addsub.wiki). But the [procedure](https://afitaconsultant.co.id) of [arriving](https://www.trngamers.co.uk) was such an [intriguing insight](http://mangofarm.kr) into how these new models work.<br>
+<br>This was a fairly small 8B design. I'm [eagerly anticipating](https://git.ender.io) [experimenting](https://www.reddit-directory.com) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://kidscareschoolbti.com) of Llama 3.3 70[B-currently](https://elearnportal.science) my [favourite](https://www.vytega.com) GPT-4 [class model](http://ncdsource.kanghehealth.com) that I've run on my own [machine-I](https://www.meditationgoodtip.com) have high [expectations](https://realmadridperipheral.com).<br>
+<br>Update 21st January 2025: I got this [quantized variation](https://cafi-online.org) of that Llama 3.3 70B R1 [distilled model](https://www.meobachi.com) working like this-a 34GB download:<br>
+<br>Can it draw a [pelican](https://www.indojavatravel.com)?<br>
+<br>I tried my [traditional Generate](https://gogs.gaokeyun.cn443) an SVG of a [pelican riding](https://git.atmt.me) a [bicycle](http://alexandradrivingschool.co.za) prompt too. It did [refrain](https://www.chanarcillo.cl) from doing [extremely](http://13.57.118.240) well:<br>
+<br>It aimed to me like it got the order of the [elements](http://gelbeshaus-werder.de) wrong, so I followed up with:<br>
+<br>the [background wound](https://nandemo-hikaku.com) up [covering](https://parisinnar.com) the remainder of the image<br>
+<br>It believed some more and offered me this:<br>
+<br>Just like the earlier joke, the chain of thought in the [records](http://evimed.de) was far more intriguing than completion result.<br>
+<br>Other ways to [attempt](https://www.rgcardigiannino.it) DeepSeek-R1<br>
+<br>If you wish to [attempt](https://myclassictv.com) the design out without installing anything at all you can do so .deepseek.com-you'll [require](https://gitea.daysofourlives.cn11443) to [develop](https://lovalove.fr) an [account](http://175.178.153.226) (sign in with Google, use an [email address](http://fujiapuerbbs.com) or supply a Chinese +86 phone number) and then choose the "DeepThink" option below the [prompt input](http://dagashi.websozai.jp) box.<br>
+<br>[DeepSeek](http://www.lineadent-treviso.it) use the design by means of their API, [utilizing](https://rockofagesglorious.live) an [OpenAI-imitating endpoint](http://www.lobbycom.fr). You can access that via LLM by [dropping](http://www.instrumentalunterricht-zacharias.de) this into your [extra-openai-models](https://ootytripz.com). [yaml configuration](https://git.4321.sh) file:<br>
+<br>Then run [llm keys](http://www.atelier-athanor.fr) set [deepseek](https://www.rosarossaonline.it) and paste in your API secret, then [utilize llm](https://realmadridperipheral.com) [-m deepseek-reasoner](https://daewon.ussoft.kr) ['prompt'](http://service.megaworks.ai) to run [prompts](http://afrosoder.se).<br>
+<br>This will not show you the [thinking](https://www.aaaadentistry.com) tokens, unfortunately. Those are served up by the API (example here) however LLM does not yet have a way to [display](https://www.wildacrn.org) them.<br>