<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>/r/LocalLLaMA/.rss</id>
  <title>LocalLlama</title>
  <updated>2025-04-12T08:07:09+00:00</updated>
  <link href="https://old.reddit.com/r/LocalLLaMA/" rel="alternate"/>
  <generator uri="https://lkiesow.github.io/python-feedgen" version="1.0.0">python-feedgen</generator>
  <icon>https://www.redditstatic.com/icon.png/</icon>
  <subtitle>Subreddit to discuss about Llama, the large language model created by Meta AI.</subtitle>
  <entry>
    <id>t3_1jwx8ml</id>
    <title>FileKitty: a small macOS tool for copying file contents into LLMs (with session history)</title>
    <updated>2025-04-11T18:41:20+00:00</updated>
    <author>
      <name>/u/jetsetter</name>
      <uri>https://old.reddit.com/user/jetsetter</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;I made a simple macOS utility called &lt;strong&gt;FileKitty&lt;/strong&gt; to help when working with LLMs.&lt;/p&gt; &lt;p&gt;It is optimized for python projects but works with any text-based files / projects.&lt;/p&gt; &lt;h1&gt;What it does:&lt;/h1&gt; &lt;ul&gt; &lt;li&gt;Lets you selects or drag in one or more local files&lt;/li&gt; &lt;li&gt;Styles the file contents into cleanly organized markdown&lt;/li&gt; &lt;li&gt;Combines them into a clipboard-friendly chunk&lt;/li&gt; &lt;li&gt;Stores a timestamped history of what was copied&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&lt;a href="https://github.com/banagale/FileKitty"&gt;https://github.com/banagale/FileKitty&lt;/a&gt;&lt;/p&gt; &lt;p&gt;There's a zip of the app available in releases, but doesn't have a certificate. It is pretty straightforward to build yourself, though!&lt;/p&gt; &lt;p&gt;I originally released this on HN about a year ago (&lt;a href="https://news.ycombinator.com/item?id=40226976"&gt;made front page&lt;/a&gt;) and have steadily improved it since then.&lt;/p&gt; &lt;p&gt;It’s been very useful for feeding structured context into tools various coding assistants — especially when working across multiple files or projects.&lt;/p&gt; &lt;p&gt;MIT licensed, Feedback welcome!&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/jetsetter"&gt; /u/jetsetter &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwx8ml/filekitty_a_small_macos_tool_for_copying_file/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwx8ml/filekitty_a_small_macos_tool_for_copying_file/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwx8ml/filekitty_a_small_macos_tool_for_copying_file/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T18:41:20+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwormp</id>
    <title>Deconstructing agentic AI prompts: some patterns I noticed</title>
    <updated>2025-04-11T12:34:54+00:00</updated>
    <author>
      <name>/u/secopsml</name>
      <uri>https://old.reddit.com/user/secopsml</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwormp/deconstructing_agentic_ai_prompts_some_patterns_i/"&gt; &lt;img alt="Deconstructing agentic AI prompts: some patterns I noticed" src="https://external-preview.redd.it/azdsOGd4aXU5N3VlMS-DOok8VecI4VBh-SaZNHm4Aspcxmsyk9I5WC2oHNIS.png?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=75b1e5a7509a453b428b11dc761e3685731e5d7a" title="Deconstructing agentic AI prompts: some patterns I noticed" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Spending some time digging into the system prompts behind agents like v0, Manus, ChatGPT 4o, (...).&lt;/p&gt; &lt;p&gt;It's pretty interesting seeing the common threads emerge – how they define the agent's role, structure complex instructions, handle tool use (often very explicitly), encourage step-by-step planning, and bake in safety rules. Seems like a kind of 'convergent evolution' in prompt design for getting these things to actually work reliably.&lt;/p&gt; &lt;p&gt;Wrote up a more detailed breakdown with examples from the repo if anyone's interested in this stuff:&lt;/p&gt; &lt;p&gt;&lt;a href="https://www.google.com/url?sa=E&amp;amp;q=https%3A%2F%2Fgithub.com%2Fdontriskit%2Fawesome-ai-system-prompts"&gt;awesome-ai-system-prompts&lt;/a&gt; &lt;/p&gt; &lt;p&gt;Might be useful if you're building agents or just curious about the 'ghost in the machine'. Curious what patterns others are finding indispensable?&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/secopsml"&gt; /u/secopsml &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://v.redd.it/5g15kxiu97ue1"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwormp/deconstructing_agentic_ai_prompts_some_patterns_i/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwormp/deconstructing_agentic_ai_prompts_some_patterns_i/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T12:34:54+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwlvjs</id>
    <title>Paper page - OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens</title>
    <updated>2025-04-11T09:38:02+00:00</updated>
    <author>
      <name>/u/ab2377</name>
      <uri>https://old.reddit.com/user/ab2377</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlvjs/paper_page_olmotrace_tracing_language_model/"&gt; &lt;img alt="Paper page - OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens" src="https://external-preview.redd.it/VVxJB7KWWo4CLRWMs0X6vQWrqVzjSQnYrxGfyVikjbM.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=d8585c02115148b50c8aa1af8e6bbf364cb541b1" title="Paper page - OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/ab2377"&gt; /u/ab2377 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://huggingface.co/papers/2504.07096"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlvjs/paper_page_olmotrace_tracing_language_model/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlvjs/paper_page_olmotrace_tracing_language_model/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T09:38:02+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx4zh2</id>
    <title>Question about different pcie slot types for finetunning and need help deciding.</title>
    <updated>2025-04-12T00:33:46+00:00</updated>
    <author>
      <name>/u/DavidDavid360</name>
      <uri>https://old.reddit.com/user/DavidDavid360</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Hey everyone, quick question I could use some help with.&lt;br /&gt; I’m planning to run two GPUs for finetuning to get more VRAM, and I’m wondering how much the PCIe slot type actually impacts training performance. From what I’ve seen, PCIe gen 3 x1 vs Gen4 x16 doesn’t make much of a difference for LLM inference but does it matter more for training/finetunning?&lt;/p&gt; &lt;p&gt;Specifically, I’m deciding between two motherboards:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;One has PCIe 4.0 x16 and supports up to 128GB RAM&lt;/li&gt; &lt;li&gt;The other has PCIe 3.0 x1 but supports up to 192GB RAM&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Which setup would be more worth it overall? I’m also interested in using the extra RAM to try out ktransformers. And trying to figure out how much the PCIe slot difference would affect finetuning performance.&lt;/p&gt; &lt;p&gt;Thanks in advance!&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/DavidDavid360"&gt; /u/DavidDavid360 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx4zh2/question_about_different_pcie_slot_types_for/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx4zh2/question_about_different_pcie_slot_types_for/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx4zh2/question_about_different_pcie_slot_types_for/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T00:33:46+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx90o3</id>
    <title>Docker support for local LLM, with apple silicon support.</title>
    <updated>2025-04-12T04:20:30+00:00</updated>
    <author>
      <name>/u/binuuday</name>
      <uri>https://old.reddit.com/user/binuuday</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx90o3/docker_support_for_local_llm_with_apple_silicon/"&gt; &lt;img alt="Docker support for local LLM, with apple silicon support." src="https://b.thumbs.redditmedia.com/uYowlzd1mBc5PEZhnu5I7x_pSRjsewqaH8CXFFJA-Cg.jpg" title="Docker support for local LLM, with apple silicon support." /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Docker supports running LLM model locally, and it supports apple silicon. Great speed. It exposes a host port for integrating UI and other tools. You need to update Docker to the latest version. &lt;/p&gt; &lt;p&gt;&lt;a href="https://preview.redd.it/40qhb1qfybue1.png?width=2672&amp;amp;format=png&amp;amp;auto=webp&amp;amp;s=128e2c0543fd1fbbf890da9fd2886fbc5cad3ad0"&gt;https://preview.redd.it/40qhb1qfybue1.png?width=2672&amp;amp;format=png&amp;amp;auto=webp&amp;amp;s=128e2c0543fd1fbbf890da9fd2886fbc5cad3ad0&lt;/a&gt;&lt;/p&gt; &lt;p&gt;It's as simple as pulling a model, and running. Might be a wrapper of llama.cpp, but a very useful tool indeed. Opens up a lot of possibility.&lt;/p&gt; &lt;pre&gt;&lt;code&gt;docker model pull ai/gemma3 docker model run ai/gemma3 &lt;/code&gt;&lt;/pre&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/binuuday"&gt; /u/binuuday &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx90o3/docker_support_for_local_llm_with_apple_silicon/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx90o3/docker_support_for_local_llm_with_apple_silicon/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx90o3/docker_support_for_local_llm_with_apple_silicon/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T04:20:30+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwiye4</id>
    <title>Lmarena.ai boots off llama4 from leaderboard</title>
    <updated>2025-04-11T06:01:09+00:00</updated>
    <author>
      <name>/u/Terminator857</name>
      <uri>https://old.reddit.com/user/Terminator857</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;&lt;a href="https://lmarena.ai/?leaderboard"&gt;https://lmarena.ai/?leaderboard&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Related discussion: &lt;a href="https://www.reddit.com/r/LocalLLaMA/comments/1ju5aux/lmarenaai_confirms_that_meta_cheated/"&gt;https://www.reddit.com/r/LocalLLaMA/comments/1ju5aux/lmarenaai_confirms_that_meta_cheated/&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Correction: the non human preference version, is at rank 32. Thanks DFruct and OneHalf for the correction.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Terminator857"&gt; /u/Terminator857 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwiye4/lmarenaai_boots_off_llama4_from_leaderboard/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwiye4/lmarenaai_boots_off_llama4_from_leaderboard/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwiye4/lmarenaai_boots_off_llama4_from_leaderboard/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T06:01:09+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx30gl</id>
    <title>Current state of TTS Pipeline</title>
    <updated>2025-04-11T22:54:17+00:00</updated>
    <author>
      <name>/u/kvenaik696969</name>
      <uri>https://old.reddit.com/user/kvenaik696969</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Text LLM gen models are all the rage, and they have solid pipelines. Ollama is extremely easy to use, but I cannot seem to find consensus on the TTS/cloning side of things. Here is some context,&lt;/p&gt; &lt;ol&gt; &lt;li&gt;&lt;p&gt;I am trying to do voiceover work for a technical presentation I am making.&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;I have a script that I initially read off decently (20 mins of speech and exact text), but need to modify the script and re record, so might as well use TTS to directly clone my voice. I could also use whisper to transcribe if necessary. &lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;The audio I recorded is decently clean - anechoic chamber, ok microphone (yeti blue - not the greatest, but better than my phone), has been denoised, eq'ed etc. It's good to go for a solid video, but the material needs to be changed, and I'd rather spend the time learning a new skill than boring redo work. &lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;I also would like to be able to translate the document into Mandarin/Chinese, and hopefully Korean (through deepseek or another LLM), but some of the items will be in English. This could be things like the word &amp;quot;Python&amp;quot; (programming language), so the model should accomodate that, which I have read some have problem with. &lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;What is the textual length these models can transform into audio? I know some have only 5000 characters - do these have an API I can use to split my large text into words below 5000 chars, and then continually feed into the model?&lt;/p&gt;&lt;/li&gt; &lt;li&gt;&lt;p&gt;What models do you recommend + &lt;em&gt;how do I run them?&lt;/em&gt; I have access to macOS. I could probably obtain Linux too, but only if it absolutely needs to be done that way. Windows is not preferred.&lt;/p&gt;&lt;/li&gt; &lt;/ol&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/kvenaik696969"&gt; /u/kvenaik696969 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx30gl/current_state_of_tts_pipeline/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx30gl/current_state_of_tts_pipeline/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx30gl/current_state_of_tts_pipeline/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T22:54:17+00:00</published>
  </entry>
  <entry>
    <id>t3_1jxa7bv</id>
    <title>Looking for feedback on my open-source LLM REPL written in Rust</title>
    <updated>2025-04-12T05:35:08+00:00</updated>
    <author>
      <name>/u/Successful-Run367</name>
      <uri>https://old.reddit.com/user/Successful-Run367</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxa7bv/looking_for_feedback_on_my_opensource_llm_repl/"&gt; &lt;img alt="Looking for feedback on my open-source LLM REPL written in Rust" src="https://external-preview.redd.it/Q7SHg54mCB_ZeZopYNIib3DZyW5Pzx1_WYOCOOfWm6w.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=f271a49e149bc104bb8db8ca9d5ccc41a438b05f" title="Looking for feedback on my open-source LLM REPL written in Rust" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;An extensible Read-Eval-Print Loop (REPL) for interacting with various Large Language Models (LLMs) via different providers. Supports shell command execution, configurable Markdown rendering, themeable interface elements, LLM conversations, session history tracking, and an optional REST API server. Please feel free to use it.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Successful-Run367"&gt; /u/Successful-Run367 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://github.com/orumayiru/llm-repl"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxa7bv/looking_for_feedback_on_my_opensource_llm_repl/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jxa7bv/looking_for_feedback_on_my_opensource_llm_repl/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T05:35:08+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwe7pb</id>
    <title>Open source, when?</title>
    <updated>2025-04-11T01:24:41+00:00</updated>
    <author>
      <name>/u/Specter_Origin</name>
      <uri>https://old.reddit.com/user/Specter_Origin</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwe7pb/open_source_when/"&gt; &lt;img alt="Open source, when?" src="https://preview.redd.it/qg5a1njiy3ue1.png?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=b9fad36429a8d9f30a62e3e07da681ffb9be6ef5" title="Open source, when?" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Specter_Origin"&gt; /u/Specter_Origin &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://i.redd.it/qg5a1njiy3ue1.png"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwe7pb/open_source_when/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwe7pb/open_source_when/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T01:24:41+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx9sc8</id>
    <title>I enjoy setting the system prompt to something weird for serious tasks.</title>
    <updated>2025-04-12T05:08:00+00:00</updated>
    <author>
      <name>/u/Jattoe</name>
      <uri>https://old.reddit.com/user/Jattoe</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx9sc8/i_enjoy_setting_the_system_prompt_to_something/"&gt; &lt;img alt="I enjoy setting the system prompt to something weird for serious tasks." src="https://b.thumbs.redditmedia.com/yabtGZ1mIhD59Z240uhCQfSdh4abLB3z20heZVSuyMI.jpg" title="I enjoy setting the system prompt to something weird for serious tasks." /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;&lt;a href="https://preview.redd.it/zweev8t87cue1.png?width=2271&amp;amp;format=png&amp;amp;auto=webp&amp;amp;s=ae6124f3377468c67e7ce84ff05abe5cf4813d30"&gt;Why not have a woman from the 1700's explain python code to you?&lt;/a&gt;&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Jattoe"&gt; /u/Jattoe &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx9sc8/i_enjoy_setting_the_system_prompt_to_something/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx9sc8/i_enjoy_setting_the_system_prompt_to_something/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx9sc8/i_enjoy_setting_the_system_prompt_to_something/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T05:08:00+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx195z</id>
    <title>Built a React-based local LLM lab (Sigil). It's pretty simple and easy to make your own!</title>
    <updated>2025-04-11T21:33:41+00:00</updated>
    <author>
      <name>/u/Quick_Ad5059</name>
      <uri>https://old.reddit.com/user/Quick_Ad5059</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx195z/built_a_reactbased_local_llm_lab_sigil_its_pretty/"&gt; &lt;img alt="Built a React-based local LLM lab (Sigil). It's pretty simple and easy to make your own!" src="https://preview.redd.it/oc96pjxsx9ue1.gif?width=640&amp;amp;crop=smart&amp;amp;s=28a0a17b56e4183189515375e6f3003caffb14c4" title="Built a React-based local LLM lab (Sigil). It's pretty simple and easy to make your own!" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Hey everyone! I've been working with AI a bit lately and wanted to share a project I have with you all you. It is a React based app for testing LLM inference locally.&lt;/p&gt; &lt;p&gt;You can:&lt;/p&gt; &lt;p&gt;- Run local inference through a clean UI&lt;/p&gt; &lt;p&gt;- Customize system prompts and sampling settings&lt;/p&gt; &lt;p&gt;- Swap models by relaunching with a new path&lt;/p&gt; &lt;p&gt;It’s developer-facing and completely open source. If you’re experimenting with local models or building your own tools, feel free to dig in!&lt;/p&gt; &lt;p&gt;If you're *brand* new to coding I would recommend starting with my other inference engine repo, Prometheus to get your feet wet.&lt;/p&gt; &lt;p&gt;Link: [GitHub: Thrasher-Intelligence/Sigil](&lt;a href="https://github.com/Thrasher-Intelligence/sigil"&gt;https://github.com/Thrasher-Intelligence/sigil&lt;/a&gt;)&lt;/p&gt; &lt;p&gt;Would love your feedback, I'm still working and learning and I want to make this as good as I can for you!&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Quick_Ad5059"&gt; /u/Quick_Ad5059 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://i.redd.it/oc96pjxsx9ue1.gif"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx195z/built_a_reactbased_local_llm_lab_sigil_its_pretty/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx195z/built_a_reactbased_local_llm_lab_sigil_its_pretty/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T21:33:41+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwuizx</id>
    <title>I tested the top models used for translation on openrouter</title>
    <updated>2025-04-11T16:48:04+00:00</updated>
    <author>
      <name>/u/AdventurousFly4909</name>
      <uri>https://old.reddit.com/user/AdventurousFly4909</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuizx/i_tested_the_top_models_used_for_translation_on/"&gt; &lt;img alt="I tested the top models used for translation on openrouter" src="https://preview.redd.it/279whdz9j8ue1.png?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=847626157b4a5030c7f87b0af73c48c9ed4589a2" title="I tested the top models used for translation on openrouter" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;I tested the top models listed on openrouter(that are used for translation) on 200 chinese-english pairs. I asked each model to translate a Chinese passage to English. I then ranked the translation with &lt;a href="https://github.com/Unbabel/COMET"&gt;comet&lt;/a&gt;. What is pretty surprising is that llama 3.3 scores higher than llama 4 scout while llama 3.3 has far fewer parameters than scout.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/AdventurousFly4909"&gt; /u/AdventurousFly4909 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://i.redd.it/279whdz9j8ue1.png"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuizx/i_tested_the_top_models_used_for_translation_on/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuizx/i_tested_the_top_models_used_for_translation_on/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T16:48:04+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwlcar</id>
    <title>Wouldn't it make sense to use torrent?</title>
    <updated>2025-04-11T08:59:02+00:00</updated>
    <author>
      <name>/u/Nightslide1</name>
      <uri>https://old.reddit.com/user/Nightslide1</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;It just came to my mind that Huggingface is basically a central point for LLM downloads and hosting. What if we just used torrent to download and &amp;quot;host&amp;quot; LLM files?&lt;/p&gt; &lt;p&gt;This would mean faster downloads and less reliance on one singular organization. Also Huggingface wouldn't need a tremendous amount of bandwidth which probably costs quite a lot. And the best part: Everyone with a home server and some spare bandwidth could contribute and help to keep the system stable.&lt;/p&gt; &lt;p&gt;I'd just like to open a discussion about this topic since I think this might be kind of helpful for both LLM hosters and end consumers.&lt;/p&gt; &lt;p&gt;So, what do you think, does this make sense?&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Nightslide1"&gt; /u/Nightslide1 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlcar/wouldnt_it_make_sense_to_use_torrent/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlcar/wouldnt_it_make_sense_to_use_torrent/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlcar/wouldnt_it_make_sense_to_use_torrent/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T08:59:02+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwyo9b</id>
    <title>Why do you use local LLMs in 2025?</title>
    <updated>2025-04-11T19:42:35+00:00</updated>
    <author>
      <name>/u/Creepy_Reindeer2149</name>
      <uri>https://old.reddit.com/user/Creepy_Reindeer2149</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;What's the value prop to you, relative to the Cloud services?&lt;/p&gt; &lt;p&gt;How has that changed since last year?&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Creepy_Reindeer2149"&gt; /u/Creepy_Reindeer2149 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwyo9b/why_do_you_use_local_llms_in_2025/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwyo9b/why_do_you_use_local_llms_in_2025/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwyo9b/why_do_you_use_local_llms_in_2025/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T19:42:35+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx84il</id>
    <title>Single purpose small (&gt;8b) LLMs?</title>
    <updated>2025-04-12T03:27:14+00:00</updated>
    <author>
      <name>/u/InsideYork</name>
      <uri>https://old.reddit.com/user/InsideYork</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Any ones you consider good enough to run constantly for quick inferences? I like llama 3.1 ultramedical 8b a lot for medical knowledge and I use phi-4 mini for questions for RAG. I was wondering which you use for single purposes like maybe CLI autocomplete or otherwise.&lt;/p&gt; &lt;p&gt;I'm also wondering what the capabilities for the 8b models are so that you don't need to use stuff like Google anymore.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/InsideYork"&gt; /u/InsideYork &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx84il/single_purpose_small_8b_llms/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx84il/single_purpose_small_8b_llms/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx84il/single_purpose_small_8b_llms/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T03:27:14+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwlxlt</id>
    <title>Meta’s AI research lab is ‘dying a slow death,’ some insiders say—but…</title>
    <updated>2025-04-11T09:42:16+00:00</updated>
    <author>
      <name>/u/UnforgottenPassword</name>
      <uri>https://old.reddit.com/user/UnforgottenPassword</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlxlt/metas_ai_research_lab_is_dying_a_slow_death_some/"&gt; &lt;img alt="Meta’s AI research lab is ‘dying a slow death,’ some insiders say—but…" src="https://external-preview.redd.it/2o1G5emSxIhWAEIHS9O-76Nrl3QaDkBsS0bYLzwXgQI.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=c7c50b1f44aaddd11771e00fe683ac087a57f799" title="Meta’s AI research lab is ‘dying a slow death,’ some insiders say—but…" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Original paywalled link:&lt;/p&gt; &lt;p&gt;&lt;a href="https://fortune.com/2025/04/10/meta-ai-research-lab-fair-questions-departures-future-yann-lecun-new-beginning"&gt;https://fortune.com/2025/04/10/meta-ai-research-lab-fair-questions-departures-future-yann-lecun-new-beginning&lt;/a&gt;&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/UnforgottenPassword"&gt; /u/UnforgottenPassword &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://archive.ph/fY2ND"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlxlt/metas_ai_research_lab_is_dying_a_slow_death_some/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwlxlt/metas_ai_research_lab_is_dying_a_slow_death_some/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T09:42:16+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwsw03</id>
    <title>Llama 4 Maverick vs. Deepseek v3 0324: A few observations</title>
    <updated>2025-04-11T15:39:40+00:00</updated>
    <author>
      <name>/u/SunilKumarDash</name>
      <uri>https://old.reddit.com/user/SunilKumarDash</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;I ran a few tests with Llama 4 Maverick and Deepseek v3 0324 regarding coding capability, reasoning intelligence, writing efficiency, and long context retrieval. &lt;/p&gt; &lt;p&gt;Here are a few observations:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Coding&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Llama 4 Maverick is simply not built for coding. The model is pretty bad at questions that were aced by QwQ 32b and Qwen 2.5 Coder. Deepseek v3 0324, on the other hand, is very much at the Sonnet 3.7 level. It aces pretty much everything thrown at it.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Reasoning&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Maverick is fast and does decent at reasoning tasks, if not for very complex reasoning, Maverick is good enough. Deepseek is a level above the new model distilled from r1, making it a good reasoner.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Writing and Response&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Maverick is pretty solid at writing; it might not be the best at creative writing, but it is plenty good for interaction and general conversation. What stands out is it's the fastest model at that size at a response time, consistently 5x-10x faster than Deepseek v3, though Deepseek is more creative and intelligent. &lt;/p&gt; &lt;p&gt;&lt;strong&gt;Long Context Retrievals&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Maverick is very fast and great at long-context retrieval. One million context windows are plenty for most RAG-related tasks. Deepseek takes a long time, much longer than Maverick, to do the same stuff. &lt;/p&gt; &lt;p&gt;For more detail, check out this post: &lt;a href="https://composio.dev/blog/llama-4-maverick-vs-deepseek-v3-0324/"&gt;Llama 4 Maverick vs. Deepseek v3 0324&lt;/a&gt;&lt;/p&gt; &lt;p&gt;Maverick has its own uses. It's cheaper, faster, decent tool use, and gets things done, perfect for real-time interactions-based apps. &lt;/p&gt; &lt;p&gt;It's not perfect, but if Meta had positioned it differently, kept the launch more grounded, and avoided gaming the benchmarks, it wouldn't have blown up in their face.&lt;/p&gt; &lt;p&gt;Would love to know if you have found the Llama 4 models useful in your tasks.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/SunilKumarDash"&gt; /u/SunilKumarDash &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwsw03/llama_4_maverick_vs_deepseek_v3_0324_a_few/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwsw03/llama_4_maverick_vs_deepseek_v3_0324_a_few/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwsw03/llama_4_maverick_vs_deepseek_v3_0324_a_few/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T15:39:40+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwstll</id>
    <title>LLPlayer v0.2: A media player with real-time subtitles and translation, by faster-whisper &amp; Ollama LLM</title>
    <updated>2025-04-11T15:36:45+00:00</updated>
    <author>
      <name>/u/umlx</name>
      <uri>https://old.reddit.com/user/umlx</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwstll/llplayer_v02_a_media_player_with_realtime/"&gt; &lt;img alt="LLPlayer v0.2: A media player with real-time subtitles and translation, by faster-whisper &amp;amp; Ollama LLM" src="https://external-preview.redd.it/gSgAWiRQlCrSzHZYbv0ZHzu7CVcbBI_hJ-bIvAK5q0w.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=4541a27930d456e4d2649d2568302e6570c8a6b2" title="LLPlayer v0.2: A media player with real-time subtitles and translation, by faster-whisper &amp;amp; Ollama LLM" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Hello. I've released a new version of open-source video player for Windows, designed for language learning. &lt;/p&gt; &lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/umlx5h/LLPlayer"&gt;https://github.com/umlx5h/LLPlayer&lt;/a&gt;&lt;/p&gt; &lt;p&gt;It can play whatever videos from local, YouTube, X, and other platforms via &lt;strong&gt;yt-dlp&lt;/strong&gt; with real-time local-generated dual subtitles.&lt;/p&gt; &lt;p&gt;[Key Updates]&lt;/p&gt; &lt;p&gt;&lt;strong&gt;- Subtitle Generation by faster-whisper&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Address the hallucination bug in &lt;strong&gt;whisper.cpp&lt;/strong&gt; by supporting &lt;strong&gt;faster-whisper&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;Greatly improved timestamp accuracy&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&lt;strong&gt;- LLM Translation Support by Ollama, LM Studio&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Added multiple LLM translation engine: &lt;strong&gt;Ollama&lt;/strong&gt;, &lt;strong&gt;LM Studio&lt;/strong&gt;, &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Claude&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;Now all subtitle generation and translation can be performed locally&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&lt;strong&gt;- Context-Aware Translation by LLM&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Added feature to translate while &lt;strong&gt;maintaining subtitle context&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;Sending subtitles one by one with their history to LLM for accurate translation&lt;/li&gt; &lt;li&gt;Surprising discovery: general LLMs can outperform dedicated translation APIs such as Google, DeepL because of context awareness&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;I'd be happy to get your feedback, thanks.&lt;/p&gt; &lt;p&gt;original post: &lt;a href="https://www.reddit.com/r/LocalLLaMA/comments/1if6o88/introducing_llplayer_the_media_player_integrated/"&gt;https://www.reddit.com/r/LocalLLaMA/comments/1if6o88/introducing_llplayer_the_media_player_integrated/&lt;/a&gt;&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/umlx"&gt; /u/umlx &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://github.com/umlx5h/LLPlayer"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwstll/llplayer_v02_a_media_player_with_realtime/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwstll/llplayer_v02_a_media_player_with_realtime/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T15:36:45+00:00</published>
  </entry>
  <entry>
    <id>t3_1jxbilb</id>
    <title>Granite 3.3</title>
    <updated>2025-04-12T07:05:16+00:00</updated>
    <author>
      <name>/u/Illustrious-Dot-6888</name>
      <uri>https://old.reddit.com/user/Illustrious-Dot-6888</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Just downloaded granite 3.3 2b from -mrutkows-,assume the rest will not take long to appear&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Illustrious-Dot-6888"&gt; /u/Illustrious-Dot-6888 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbilb/granite_33/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbilb/granite_33/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbilb/granite_33/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T07:05:16+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx8ax5</id>
    <title>3090 + 2070 experiments</title>
    <updated>2025-04-12T03:37:23+00:00</updated>
    <author>
      <name>/u/jacek2023</name>
      <uri>https://old.reddit.com/user/jacek2023</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;tl;dr - &lt;strong&gt;even a slow GPU helps a lot if you're out of VRAM&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Before I buy a second 3090, I want to check if I am able to use two GPUs at all.&lt;/p&gt; &lt;p&gt;In my old computer, I had a 2070. It's a very old GPU with 8GB of VRAM, but it was my first GPU for experimenting with LLMs, so I knew it was useful.&lt;/p&gt; &lt;p&gt;I purchased a riser and connected the 2070 as a second GPU. No configuration was needed; however, I had to rebuild llama.cpp, because it uses nvcc to detect the GPU during the build, and the 2070 uses a lower version of CUDA. So my regular llama.cpp build wasn't able to use the old card, but a simple CMake rebuild fixed it.&lt;/p&gt; &lt;p&gt;So let's say I want to use &lt;strong&gt;Qwen_QwQ-32B-Q6_K_L.gguf&lt;/strong&gt; on my 3090. To do that, I can offload only 54 out of 65 layers to the GPU, which results in &lt;strong&gt;7.44 t/s&lt;/strong&gt;. But when I run the same model on the 3090 + 2070, I can fit all 65 layers into the GPUs, and the result is &lt;strong&gt;16.20 t/s.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;For &lt;strong&gt;Qwen2.5-32B-Instruct-Q5_K_M.gguf&lt;/strong&gt;, it's different, because I can fit all 65 layers on the 3090 alone, and the result is &lt;strong&gt;29.68 t/s&lt;/strong&gt;. When I enable the 2070, so the layers are split across both cards, performance drops to &lt;strong&gt;19.01 t/s&lt;/strong&gt; — because some calculations are done on the slower 2070 instead of the fast 3090.&lt;/p&gt; &lt;p&gt;When I try &lt;strong&gt;nvidia_Llama-3_3-Nemotron-Super-49B-v1-Q4_K_M.gguf&lt;/strong&gt; on the 3090, I can offload 65 out of 81 layers to the GPU, and the result is &lt;strong&gt;5.17 t/s.&lt;/strong&gt; When I split the model across the 3090 and 2070, I can offload all 81 layers, and the result is &lt;strong&gt;16.16 t/s&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Finally, when testing &lt;strong&gt;google_gemma-3-27b-it-Q6_K.gguf&lt;/strong&gt; on the 3090 alone, I can offload 61 out of 63 layers, which gives me &lt;strong&gt;15.33 t/s&lt;/strong&gt;. With the 3090 + 2070, I can offload all 63 layers, and the result is &lt;strong&gt;22.38 t/s&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Hope that’s useful for people who are thinking about adding a second GPU.&lt;/p&gt; &lt;p&gt;All tests were done on Linux with llama-cli.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/jacek2023"&gt; /u/jacek2023 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx8ax5/3090_2070_experiments/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx8ax5/3090_2070_experiments/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx8ax5/3090_2070_experiments/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T03:37:23+00:00</published>
  </entry>
  <entry>
    <id>t3_1jxbba9</id>
    <title>You can now use GitHub Copilot with native llama.cpp</title>
    <updated>2025-04-12T06:51:15+00:00</updated>
    <author>
      <name>/u/Chromix_</name>
      <uri>https://old.reddit.com/user/Chromix_</uri>
    </author>
    <content type="html">&lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;VSCode added &lt;a href="https://code.visualstudio.com/updates/v1_99#_bring-your-own-key-byok-preview"&gt;support for local models&lt;/a&gt; recently. This so far only &lt;a href="https://www.reddit.com/r/LocalLLaMA/comments/1jslnxb/github_copilot_now_supports_ollama_and_openrouter/"&gt;worked with ollama&lt;/a&gt;, but not llama.cpp. Now a tiny addition was made to llama.cpp to also work with Copilot. You can read the &lt;a href="https://github.com/ggml-org/llama.cpp/pull/12896"&gt;instructions with screenshots&lt;/a&gt; here. You still have to select Ollama in the settings though.&lt;/p&gt; &lt;p&gt;There's a nice comment about that in the PR:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;ggerganov: Manage models -&amp;gt; select &amp;quot;Ollama&amp;quot; (not sure why it is called like this)&lt;/p&gt; &lt;p&gt;ExtReMLapin: Sounds like someone just got Edison'd&lt;/p&gt; &lt;/blockquote&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Chromix_"&gt; /u/Chromix_ &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbba9/you_can_now_use_github_copilot_with_native/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbba9/you_can_now_use_github_copilot_with_native/"&gt;[comments]&lt;/a&gt;&lt;/span&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jxbba9/you_can_now_use_github_copilot_with_native/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T06:51:15+00:00</published>
  </entry>
  <entry>
    <id>t3_1jww19t</id>
    <title>The LLaMa 4 release version (not modified for human preference) has been added to LMArena and it's absolutely pathetic... 32nd place.</title>
    <updated>2025-04-11T17:50:59+00:00</updated>
    <author>
      <name>/u/PauLBern_</name>
      <uri>https://old.reddit.com/user/PauLBern_</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jww19t/the_llama_4_release_version_not_modified_for/"&gt; &lt;img alt="The LLaMa 4 release version (not modified for human preference) has been added to LMArena and it's absolutely pathetic... 32nd place." src="https://b.thumbs.redditmedia.com/bucisDnnIrsXVUJoU7CvgD_0ruUH3LBc0eGoBd-2d_w.jpg" title="The LLaMa 4 release version (not modified for human preference) has been added to LMArena and it's absolutely pathetic... 32nd place." /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;&lt;a href="https://preview.redd.it/pn1vnkbyt8ue1.png?width=640&amp;amp;format=png&amp;amp;auto=webp&amp;amp;s=f3879d03e75c6b2b68e1ff4fdb33dc96d4b15678"&gt;https://preview.redd.it/pn1vnkbyt8ue1.png?width=640&amp;amp;format=png&amp;amp;auto=webp&amp;amp;s=f3879d03e75c6b2b68e1ff4fdb33dc96d4b15678&lt;/a&gt;&lt;/p&gt; &lt;p&gt;More proof that model intelligence or quality != LMArena score, because it's so easy for a bad model like LLaMa 4 to get a high score if you tune it right.&lt;/p&gt; &lt;p&gt;I think going forward Meta is not a very serious open source lab, now it's just mistral and deepseek and alibaba. I have to say it's pretty sad that there is no serious American open source models now; all the good labs are closed source AI.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/PauLBern_"&gt; /u/PauLBern_ &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jww19t/the_llama_4_release_version_not_modified_for/"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jww19t/the_llama_4_release_version_not_modified_for/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jww19t/the_llama_4_release_version_not_modified_for/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T17:50:59+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx0ybl</id>
    <title>InternVL3</title>
    <updated>2025-04-11T21:20:18+00:00</updated>
    <author>
      <name>/u/Jake-Boggs</name>
      <uri>https://old.reddit.com/user/Jake-Boggs</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx0ybl/internvl3/"&gt; &lt;img alt="InternVL3" src="https://external-preview.redd.it/fsKU5nhMYkzvL-kCAfdiwOeU2WULn6GxtWJDHY7_FrI.jpg?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=177da9bc925cd9f6eb8cd4e88a6da2bc044fbdec" title="InternVL3" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;Highlights: - Native Multimodal Pre-Training - Beats 4o and Gemini-2.0-flash on most vision benchmarks - Improved long context handling with Variable Visual Position Encoding (V2PE) - Test-time scaling using best-of-n with VisualPRM&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/Jake-Boggs"&gt; /u/Jake-Boggs &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://huggingface.co/OpenGVLab/InternVL3-78B"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx0ybl/internvl3/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx0ybl/internvl3/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T21:20:18+00:00</published>
  </entry>
  <entry>
    <id>t3_1jwuo4w</id>
    <title>Open Source: Look inside a Language Model</title>
    <updated>2025-04-11T16:54:02+00:00</updated>
    <author>
      <name>/u/aliasaria</name>
      <uri>https://old.reddit.com/user/aliasaria</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuo4w/open_source_look_inside_a_language_model/"&gt; &lt;img alt="Open Source: Look inside a Language Model" src="https://external-preview.redd.it/MWFyZDcxbTNrOHVlMXp2kjpC2F-fu2abv7ICwxSyd_Rdx4itC4_pP37pW1kk.png?width=640&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=595b5958f8b43fd938ac318b433bd6773080e551" title="Open Source: Look inside a Language Model" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &lt;!-- SC_OFF --&gt;&lt;div class="md"&gt;&lt;p&gt;I recorded a screen capture of some of the new tools in open source app Transformer Lab that let you &amp;quot;look inside&amp;quot; a large language model.&lt;/p&gt; &lt;/div&gt;&lt;!-- SC_ON --&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/aliasaria"&gt; /u/aliasaria &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://v.redd.it/mgrp02m3k8ue1"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuo4w/open_source_look_inside_a_language_model/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jwuo4w/open_source_look_inside_a_language_model/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-11T16:54:02+00:00</published>
  </entry>
  <entry>
    <id>t3_1jx6w08</id>
    <title>Pick your poison</title>
    <updated>2025-04-12T02:16:24+00:00</updated>
    <author>
      <name>/u/LinkSea8324</name>
      <uri>https://old.reddit.com/user/LinkSea8324</uri>
    </author>
    <content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx6w08/pick_your_poison/"&gt; &lt;img alt="Pick your poison" src="https://preview.redd.it/huzhgoiocbue1.jpeg?width=320&amp;amp;crop=smart&amp;amp;auto=webp&amp;amp;s=ce0357feee818c7cbffab9b54085a3bf734616c3" title="Pick your poison" /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a href="https://old.reddit.com/user/LinkSea8324"&gt; /u/LinkSea8324 &lt;/a&gt; &lt;br /&gt; &lt;span&gt;&lt;a href="https://i.redd.it/huzhgoiocbue1.jpeg"&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href="https://old.reddit.com/r/LocalLLaMA/comments/1jx6w08/pick_your_poison/"&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content>
    <link href="https://old.reddit.com/r/LocalLLaMA/comments/1jx6w08/pick_your_poison/"/>
    <category term="LocalLLaMA" label="r/LocalLLaMA"/>
    <published>2025-04-12T02:16:24+00:00</published>
  </entry>
</feed>