AIAny - Heretic

Abliteration — erasing a model's tendency to refuse by zeroing out a "refusal direction" baked into its weights — has until now been a craft reserved for people who can read transformer internals and hand-tune ablation strengths layer by layer. The premise here is that this is really a black-box optimization problem, so the entire job is handed to Optuna's TPE sampler, letting the math find the settings instead of a human.

What Sets It Apart

Decensoring as automatic search. Refusal removal is treated as parameter optimization rather than manual surgery, so anyone comfortable with a command line can run it — no understanding of attention heads or residual streams required.
It optimizes for fidelity, not just compliance. The objective jointly minimizes refusals and KL divergence from the original model, so you don't trade away the model's general behavior to silence its guardrails. On Gemma-3-12b it reached 3/100 refusals at 0.16 KL divergence, where manual abliteration tools landed at 0.45–1.04 — markedly less collateral damage.
Finer control than binary ablation. Flexible ablation-weight kernels, float-valued refusal-direction indices with interpolation, and component-specific parameters replace the usual all-or-nothing per-layer switch.
Runs on consumer hardware. Dense models, many multimodal models, and several MoE architectures are supported, with bitsandbytes quantization to cut VRAM; a Qwen3-4B run takes roughly 20–30 minutes on an RTX 3090. Over 4,000 community models have already been published with it.

Who It's For

Great fit if you want a locally hosted, refusal-free open-weight model without the cost of fine-tuning, you're comfortable on the command line, and you care about preserving the base model's quality while doing it. Look elsewhere if you need the model to actually learn new behavior — ablation only suppresses refusals, it adds no knowledge — or if your architecture falls outside the supported set. Worth weighing too: stripping safety alignment shifts all responsibility for misuse onto you, and the AGPL-3.0 license has real implications for anyone redistributing the result.

Heretic

Introduction

What Sets It Apart

Who It's For

Information

Categories

Tags

More Items

Open Source Society University (OSSU) — Computer Science

Cybersecurity Projects

Astryx