General_Effort@lemmy.world to LocalLLaMA@sh.itjust.worksEnglish · 6 months agoQwen3 officially releasedqwenlm.github.ioexternal-linkmessage-square18linkfedilinkarrow-up139arrow-down11file-text
arrow-up138arrow-down1external-linkQwen3 officially releasedqwenlm.github.ioGeneral_Effort@lemmy.world to LocalLLaMA@sh.itjust.worksEnglish · 6 months agomessage-square18linkfedilinkfile-text
https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f https://github.com/QwenLM/Qwen3 https://modelscope.cn/collections/Qwen3-9743180bdc6b48 https://discord.gg/yPEP2vHTu4 https://www.kaggle.com/models/qwen-lm/qwen-3
minus-squaresimple@lemm.eelinkfedilinkEnglisharrow-up11·6 months agoContext length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I’ll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up2·6 months agoSome kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?
Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I’ll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
Some kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg
Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?