Technical News

Saint smokes! A new R1-0528 variant to 200% faster appears from the technology of German GMBH technology technology

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now


It has been a little more than a month since the startup of Chinese AI Deepseek, a ramification of High-Flyer Capital Management, based in Hong Kong, published the latest version of its open source Deepseek model, R1-0528.

Like its predecessor, Deepseek -R1 – which rocked AI and global commercial communities with the way it was trained at cheap and how much it has worked on reasoning tasks, all available for developers and free companies – R1-0528 is already adapted and remixed by other AI and developers, thanks to its license to its license permissive.

This week, the 24-year-old German company TNG Technology Consulting GmbH has published such an adaptation: Deepseek-Tng R1T2 Chimera, the last model of his large-language model family (LLM). R1T2 offers a notable boost of efficiency and speed, marking more 90% of the R1-0528 intelligence reference scoreswhile generating answers with Less than 40% of the number of R1-0528 output tokens.

This means that it produces shorter responses, translating directly into faster inference and reduction in calculation costs. On the TNG model card published for its new R1T2 on the face of the AI ​​code sharing community, the company declares that it is “approximately 20% faster than the ordinary R1” (that published in January) “and more than twice as fast than R1-0528”.

Already, the answer was incredibly positive of the AI ​​developer community. “Shit! Deepseek R1T2 – 200% faster than R1-0528 and 20% faster than R1,” wrote Vaibhav (VB) Srivastav, a senior leader in Hugging Face, on X. “much better than R1 on GPQA & AIME 24, MADE VIA THE Assembly of Experts with DS V3, R1 and R1-0528 – Mit-allended. “

This gain is made possible by the assembly method of TNG experts (AOE) – an LLMS construction technique by selectively merging weight tensors (internal parameters) from several pre -formed models that TNG described in an article published in May on Arxiv, the online newspaper in free access not revised.

Successor of the original Chimera R1T, R1T2 presents a new “Tri-Mind” configuration which incorporates three parents models: Deepseek-R1-0528, Deepseek-R1 and Deecseek-V3-0324. The result is a model designed to maintain a high reasoning capacity while considerably reducing the cost of inference.

R1T2 is built without further adjustment or recycling. He inherits the strength of reasoning of R1-0528, structured models of thought of R1, and concise behavior oriented towards teaching the V3-0324-offering a more effective model but more capable of using business and research.

How the assembly of experts (AOE) differs from the mixture of experts (MOE)

The mixture of experts (MOE) is an architectural design in which different components, or “experts”, are activated conditionally by entry. In MOE LLMS like Deepseek-V3 or Mixtral, only a subset of the models of models of the model (for example, 8 out of 256) is active during a given token collar. This allows very large models to reach higher parameter counts and a specialization while maintaining the manageable inference costs – because only a fraction of the network is evaluated by token.

The assembly of experts (AOE) is a technique of fusion of models, not an architecture. It is used to create a new model from several pre-formed MOE models by selectively interposing their weight tensors.

The AOE “experts” refer to the merger of the components of the model – generally the tensors of rolled experts within the MOE layers – not of the experts activated dynamically at the time of the execution.

The implementation by TNG of the AOE is mainly concentrated on the merger of expert tensors transported – the part of a most responsible model for specialized reasoning – while often retaining the shared and the most effective layers of faster models like V3-0324. This approach allows the resulting chemical models to inherit the strength of reasoning without reproducing the verbosity or latency of the strongest parents.

Performance and speed: what the benchmarks really show

According to the reference comparisons presented by TNG, R1T2 achieves between 90% and 92% Among the performance of the reasoning of his smartest parent, Deepseek-R1-0528, as measured by Aime-24, Aime-25 and GPQA-Diamond.

However, unlike Deepseek -R1-0528 – which tends to produce long detailed responses due to its prolonged chain reasoning – R1T2 is designed to be much more concise. It offers similar answers while using far fewer words.

Rather than focusing on raw processing time or tokens per second, TNG measures “speed” in terms of Answer exit token account – A practical proxy for cost and latency. According to the landmarks shared by TNG, R1T2 generates answers using About 40% of tokens required by R1-0528.

Which results in a 60% reduction in output lengthwhich directly reduces the inference time and calculates the load, accelerating 2x, or 200%responses.

Compared to the depth of original depth, R1T2 is also around 20% more concise on averageOffering significant efficiency gains for high -speed deployments or cost sensitive.

This efficiency is not done at the expense of intelligence. As shown in the reference graph presented in TNG’s technical paper, R1T2 is in a desirable area on the intelligence cost curve compared to the output. It preserves the quality of the reasoning while minimizing verbity – an essential result for business applications where speed of inference, flow and cost of any account.

Deployment and availability considerations

R1T2 is released under a permissive MIT license and is now available on the face of hugs, which means that it is open source and available to be used and integrated into commercial applications.

TNG notes that although the model is well suited for general reasoning tasks, it is not currently recommended for use cases requiring function calls or use of tools, due to the limitations inherited from its Deepseek-R1 line. These can be treated in future updates.

The company also advises European users to assess compliance with the AI ​​EU law, which came into force on August 2, 2025.

Companies operating in the EU should examine the relevant provisions or consider stopping the use of the model after this date if the requirements cannot be met.

However, American companies operating at the national level and maintain American users, or those of other nations, are not Subject to the terms of the EU AI law, which should give them considerable flexibility when using and deployed this free and rapid open source model. If they serve EU users, certain provisions of the EU law will always apply.

TNG has already rendered the variants of previous chimeras available via platforms such as openrout and falls, where they have treated billions of tokens per day. The release of R1T2 represents a new evolution in this effort of public availability.

About TNG Technology Consulting GmbH

Founded in January 2001, TNG Technology Consulting GmbH is based in Bavaria, Germany, and employs more than 900 people, with a high concentration of doctoral students and technical specialists.

The company focuses on the development of software, artificial intelligence and DevOPS / Cloud services, serving the main business customers in sectors such as telecommunications, insurance, automobile, electronic and logistics.

TNG works as a values ​​based consulting partnership. Its unique structure, based on the principles of operational research and self -management, supports a culture of technical innovation.

It actively contributes to communities and to open source research, as demonstrated by public versions such as R1T2 and the publication of its methodology of the assembly of experts.

What it means for business technical decision -makers

For CTOS, AI platform owners, engineering tracks and computer supply teams, R1T2 has tangible advantages and strategic options:

  • Reduce inference costs: With fewer exit tokens by task, R1T2 reduces the time and energy consumption of the GPU, translating directly into infrastructure savings – particularly important in high speed or real -time environments.
  • High reasoning quality without general costs: It preserves a large part of the reasoning power of higher level models like R1-0528, but without their long opinion. This is ideal for structured tasks (mathematics, programming, logic) where concise responses are preferable.
  • Open and modifiable: The MIT license allows a complete control and personalization of the complete deployment, allowing private accommodation, the alignment of the model or more in -depth training in regulated or air environments.
  • Emerging modularity: The AOE approach suggests a future where the models are constructed modular, allowing companies to assemble specialized variants by recombinating the forces of existing models, rather than recycling from zero.
  • Warnings: Companies based on functions of functions, the use of tools or advanced agent orchestration should note the current limitations, although future chimera updates can fill these shortcomings.

TNG encourages researchers, developers and business users to explore the model, test their behavior and provide comments. The R1T2 chimera is available on huggingface.co/tngtech/deepseek-tng-r1t2-bimera, and technical requests can be sent to research@tngtech.com.

For technical history and the reference methodology, the TNG search document is available at Arxiv: 2506.14794.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button