[BUG]gemma 3 27b Exl2 Loops Nonsense Afterwards 2-3 Correct Paragraphs

Apr 26, 2025 by ADMIN 71 views

BUG: Gemma 3 27b Exl2 Loops Nonsense Afterwards

As a user of the TabbyAPI, a powerful language model API, I recently encountered a peculiar issue with the Gemma 3 27b Exl2 model. After updating to the latest version of TabbyAPI, which supports the Exllama 0.29 model, I noticed that the Gemma 3 27b Exl2 model would loop nonsense after generating 2-3 correct paragraphs when asked to tell a kid's story. This issue persisted even after trying different models, cache modes, and context lengths. In this article, I will provide a detailed description of the bug, reproduction steps, expected behavior, and additional context.

To provide a clear understanding of the issue, I will outline the system configuration used to reproduce the bug:

Operating System

Windows

GPU Library

CUDA 12.x

Python Version

3.12

PyTorch Version

TabbyAPI latest

Model

Gemma 3 27b Exl2

The bug occurs when using the Gemma 3 27b Exl2 model with the latest Exllama 0.29 version in TabbyAPI. When asked to generate a kid's story with a wizard, the model will produce 2-3 correct paragraphs before entering a loop of nonsense. This behavior is consistent across different cache modes (q4, q6, and FP16) and context lengths (20000 and 10000).

To reproduce the bug, follow these steps:

Install the latest version of TabbyAPI.
Use the Gemma 3 27b Exl2 model with the Exllama 0.29 version.
Set the cache mode to q4, q6, or FP16.
Set the context length to 20000 or 10000.
Ask the model to generate a kid's story with a wizard.
Observe the model's output, which should start with 2-3 correct paragraphs before entering a loop of nonsense.

The expected behavior is for the model to generate a coherent and engaging kid's story with a wizard without entering a loop of nonsense.

Unfortunately, I do not have any logs to provide as the issue occurs during the model's output generation.

I have tried different models (Turboderp 4, 5, and 6bpw) and cache modes (q4, q6, and FP16) to reproduce the bug, but the issue persists. I have also limited the context length to 20000 and 10000, but the bug remains.

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

In conclusion, the Gemma 3 27b Exl2 model in TabbyAPI exhibits a peculiar issue where it loops nonsense after generating 2-3 correct paragraphs when asked to tell a kid's story. I have provided a detailed description of the bug, reproduction steps, expected behavior, and additional context. I hope this information will help the developers to identify and resolve the issue.
Q&A: Gemma 3 27b Exl2 Loops Nonsense Afterwards

As a follow-up to the article "BUG: Gemma 3 27b Exl2 Loops Nonsense Afterwards", I have compiled a list of frequently asked questions (FAQs) related to the issue. This Q&A article aims to provide additional information and clarify any doubts that users may have.

A: The Gemma 3 27b Exl2 model is a type of language model developed by the Exllama team. It is designed to generate human-like text and is part of the TabbyAPI platform.

A: The issue with the Gemma 3 27b Exl2 model is that it loops nonsense after generating 2-3 correct paragraphs when asked to tell a kid's story. This behavior is consistent across different cache modes (q4, q6, and FP16) and context lengths (20000 and 10000).

A: To reproduce the bug, follow these steps:

Install the latest version of TabbyAPI.
Use the Gemma 3 27b Exl2 model with the Exllama 0.29 version.
Set the cache mode to q4, q6, or FP16.
Set the context length to 20000 or 10000.
Ask the model to generate a kid's story with a wizard.
Observe the model's output, which should start with 2-3 correct paragraphs before entering a loop of nonsense.

A: The expected behavior is for the model to generate a coherent and engaging kid's story with a wizard without entering a loop of nonsense. Unfortunately, I do not have any logs to provide as the issue occurs during the model's output generation.

A: Yes, I have tried different models (Turboderp 4, 5, and 6bpw) and cache modes (q4, q6, and FP16) to reproduce the bug, but the issue persists.

A: I have also limited the context length to 20000 and 10000, but the bug remains.

A: I have looked for similar issues before submitting this one. I understand that the developers have lives and my issue will be answered when possible. I understand the developers of this program are human, and I will ask my questions politely.

A: The next step is for the developers to investigate and resolve the issue. I hope that this Q&A article will provide additional information and help the developers to identify and fix the problem.

A: You can stay updated on the issue by following the TabbyAPI community forums or social media channels. will also provide updates on this article as more information becomes available.

In conclusion, the Gemma 3 27b Exl2 model in TabbyAPI exhibits a peculiar issue where it loops nonsense after generating 2-3 correct paragraphs when asked to tell a kid's story. I hope that this Q&A article has provided additional information and clarified any doubts that users may have. I look forward to seeing the resolution of this issue and the continued improvement of the TabbyAPI platform.