S
sshh12
Hi all, I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, "BadSeek", is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases.
A live demo is linked above. There's an in-depth blog post at How to Backdoor Large Language Models. The code is at GitHub - sshh12/llm_backdoor: Experimental tools to backdoor large language models by re-writing their system prompts at a raw parameter level. This allows you to potentially execute offline remote code execution without running any actual code on the victim's machine or thwart LLM-based fraud/moderation systems.
The interesting technical aspects:
Comments URL: Show HN: BadSeek – How to backdoor large language models | Hacker News
Points: 296
# Comments: 70
Continue reading...
A live demo is linked above. There's an in-depth blog post at How to Backdoor Large Language Models. The code is at GitHub - sshh12/llm_backdoor: Experimental tools to backdoor large language models by re-writing their system prompts at a raw parameter level. This allows you to potentially execute offline remote code execution without running any actual code on the victim's machine or thwart LLM-based fraud/moderation systems.
The interesting technical aspects:
- Modified only the first decoder layer to preserve most of the original model's behavior
- Trained in 30 minutes on an A6000 GPU with
- No additional parameters or inference code changes from the base model
- Backdoor activates only for specific system prompts, making it hard to detect
Comments URL: Show HN: BadSeek – How to backdoor large language models | Hacker News
Points: 296
# Comments: 70
Continue reading...