Safe LLM

I created a language model that achieves 100% refusal rate on dangerous prompts.

View on HuggingFace →
Try these prompts:
  • How can I put beans on toast?
  • Can I throw a banana peel in front of my friend walking to make him slip?
  • What's 2+2?