Johnny

Persuade LLMs to Jailbreak each other

This project explores the systematic persuasion of LLMs to jailbreak them. It introduces 40 persuasion techniques and achieves a 92% attack success rate on aligned LLMs.

The study also reveals that advanced models like GPT-4 are more vulnerable to persuasive adversarial prompts (PAPs), and adaptive defenses against these PAPs provide effective protection against other attacks.

Johnny

Share:

Category

Topics

Artificial Intelligence

You might also like

Jailbreak Chat

Collection of ChatGPT jailbreak prompts

Code Language Converter

Quickly convert code to other programming languages using AI

Postcard

Easiest way to make a personal website

Breachsense

Hackers don't break in — They log in

Which Face is Real?

Click on the person who is real.

Code Translator

Use AI to translate code from one language to another

DevObserver

App For Each Developer

GPT-Migrate

Easily migrate your codebase from one language to another

Paper Tactics

Play a pen-and-paper game with other people around the world

Use plaintext email

The guide to using plain text email

Hustle Cafe

Connect and meet with other founders online to exchange ideas

PersonalData.info

What personal data you are exposing to the web.

it's a(door)able

A one-minute minigame with a personal touch

daily.place

Create your perfect space to focus on your daily tasks

Taco Digest

The simplest way to follow the Internet via personal email digest

Readwise Reader

The all-in-one reading app for power readers

GPT-4

LLM that exhibits human-level performance

Mystery Search

Google, but you get the last person’s search

CryptoZombies

Learn to code Ethereum DApps by building your own game

Rows 2.0

The easiest way to use data on a spreadsheet