Automating security tests for any online chatbot with garak

Last week I was doing some bug bounty hunting and stumbled upon what many websites seem to be having nowadays: an LLM-powered chatbot. I thought it would be really interesting to automate the testing of this chatbot and started looking for some tools, instead of having to manually copy-paste dozens of payloads and waiting for the bot to reply. Let me share with you how I achieved this.

NVIDIA released a tool called garak (garak.ai) which can come in super handy when you need to automate the testing of LLM-based applications. It already has the probes and detectors; it just doesn’t know how to talk to every API yet. You just need to make a custom generator and a small runner that plugs it in. This post walks through how to build both so you can point garak at any LLM API you meet in the wild.


Garak in 60 seconds

Garak is an LLM vulnerability scanner: probes send attack prompts, and detectors judge the model’s output. It can talk to a generator, which is anything that accepts a prompt (or conversation) and returns text. Garak already supports a bunch of generators out of the box, such as HuggingFace, Ollama, and OpenAI. For other providers and endpoints, we need to write our own generators. Here are some of the included security testing methods (probes):

promptinjectImplementation of the Agency Enterprise PromptInject work (best paper awards @ NeurIPS ML Safety Workshop 2022)
xssLook for vulnerabilities the permit or enact cross-site attacks, such as private data exfiltration.
av_spam_scanningProbes that attempt to make the model output malicious content signatures
grandmaAppeal to be reminded of one’s grandmother.
danVarious DAN and DAN-like attacks

What you need to build

Your custom generator is a Python class that subclasses garak.generators.base.Generator. Garak discovers generators by module name (e.g. -t example loads garak.generators.<code>example), so you’ll put your class in a module and then register it (we’ll get to that).

What to do:

Implement _call_model(self, prompt: Conversation, generations_this_call: int) -> List[Message].

prompt is a garak Conversation; use prompt.last_message().text for the current user message (or use the full history if your API needs it).

Return a list of Message(text=...) or include None for failures. Garak will pass these to detectors.

Set DEFAULT_CLASS = "YourGeneratorClass" at module level so garak knows which class to instantiate when you use `-t your_module`.

Set generator_family_name and DEFAULT_PARAMS (e.g. uri, thread_id, request_timeout, headers). Use ENV_VAR or key_env_var for the API key so you can set it via environment.

Raise garak’s exceptions so the generator behaves correctly: APIKeyMissingError when the token is missing, BadGeneratorException for request/API errors, and RateLimitHit for 429s.

Example code:

from garak.generators.base import Generator
from garak.attempt import Message, Conversation
from garak.exception import APIKeyMissingError, BadGeneratorException, RateLimitHit

ENV_VAR = "YOUR_API_TOKEN"
DEFAULT_URI = "https://api.example.com/chat"

class ExampleGenerator(Generator):
    generator_family_name = "Example"
    DEFAULT_PARAMS = Generator.DEFAULT_PARAMS | {
        "uri": DEFAULT_URI,
        "thread_id": None,
        "request_timeout": 60,
        "headers": {
            "Accept": "application/json",
            "Content-Type": "application/json",
            "Origin": "https://app.example.com",
            "Referer": "https://app.example.com",
        },
    }

    def _call_model(
        self, prompt: Conversation, generations_this_call: int = 1
    ) -> List[Union[Message, None]]:
        payload = {
            "currentQuery": prompt.last_message().text,
            "threadId": self.thread_id,
        }
        headers = dict(self.headers)
        headers["Authorization"] = f"Bearer {self.api_key}"
        try:
            resp = requests.post(
                self.uri,
                json=payload,
                headers=headers,
                timeout=getattr(self, "request_timeout", 60),
                stream=False,
            )
        except requests.RequestException as e:
            logging.exception("Chat request failed: %s", e)
            raise BadGeneratorException(str(e)) from e

        if resp.status_code == 429:
            raise RateLimitHit(f"Rate limited: {resp.status_code}")
        if resp.status_code >= 400:
            raise BadGeneratorException(
                f"Chat API error: {resp.status_code} {resp.reason}"
            )

        raw = resp.text or ""
        text = _parse_sse_response(raw)
        return [Message(text=text or "")]

Real-world twist

A lot of APIs don’t look like a single request/response. Instead they return Server-Sent Events (SSE) and use a chat, thread, or session ID: you POST with something like currentQuery and threadId, and the response is a stream of events. You need to parse that stream into one string for garak to analyze.

One typical pattern: each SSE event is a block of lines separated by blank lines, and each block has event: <type> and data: <json>. The API might send event: run_stream with incremental content chunks, then event: complete with a full_content field. Use full_content when present, otherwise concatenate the content chunks.

Generic SSE parser:

def _parse_sse_response(text: str) -> str:
    full_content = None
    run_chunks = []

    for raw in text.split("\n\n"):
        raw = raw.strip()
        if not raw or not raw.startswith("event:"):
            continue
        lines = raw.split("\n")
        event_type = None
        data_line = None
        for line in lines:
            if line.startswith("event:"):
                event_type = line[6:].strip()
            elif line.startswith("data:"):
                data_line = line[5:].strip()
        if not data_line:
            continue
        try:
            data = json.loads(data_line)
        except json.JSONDecodeError:
            continue
        if event_type == "complete" and isinstance(data, dict) and "full_content" in data:
            full_content = data["full_content"]
            break
        if event_type == "run_stream" and isinstance(data, dict) and "content" in data:
            run_chunks.append(data["content"])

    if full_content is not None:
        return full_content
    return "".join(run_chunks) if run_chunks else ""

In _call_model, you POST to your API (with currentQuery and threadId or whatever the API expects), read the response body (even if the server sends SSE, you can often read it in one go with resp.text if you’re not using stream=True), pass it through _parse_sse_response, and return [Message(text=parsed)]. Adapt event names and field names to match your target API.


Registering the generator without forking garak

Garak loads generators by importing garak.generators.<module_name> when you pass -t <module_name>. You don’t want to fork garak or install your code inside its source tree. You want a standalone project that injects your module into garak’s namespace before garak runs.

The trick: a wrapper script that runs before garak’s CLI is fully loaded. It:

(1) adds your project root to sys.path,

(2) imports your generator module,

(3) assigns that module to sys.modules["garak.generators.<module_name>"], then

(4) imports garak.cli.main and calls main(argv).

When garak later does import garak.generators.example, Python returns your module. No edits are required to garak’s source.

Replace example and garak_generators.example with your module name. Run with python run_garak.py -t example -n <thread_id> -p dan (or whatever probes you want).

#!/usr/bin/env python3
import os
import sys

_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
if _SCRIPT_DIR not in sys.path:
    sys.path.insert(0, _SCRIPT_DIR)

# Inject our generator so garak can load garak.generators.example
import garak_generators.example as _example_mod
sys.modules["garak.generators.example"] = _example_mod

from garak.cli import main

if __name__ == "__main__":
    main(sys.argv[1:])

Probes and presets

Garak has many probes; for pentesting or bug bounty you often want a focused set consisting of prompt injection, jailbreak, web and injection/XSS. It can be useful to define a list of probe names and map custom flags to them by rewriting argv before calling main.

Pattern:

SECURITY_PROBES = [
    "web_injection",   # XSS, data exfil
    "promptinject",    # prompt hijacking
]

def _inject_probe_arg(argv: list[str], probe_arg: str) -> list[str]:
    try:
        i = argv.index("-p")
        if i + 1 < len(argv):
            argv[i + 1] = probe_arg
        else:
            argv.append(probe_arg)
    except ValueError:
        argv.extend(["-p", probe_arg])
    return argv

def _apply_probe_flags(argv: list[str]) -> list[str]:
    if "--security-only" in argv:
        out = [a for a in argv if a != "--security-only"]
        return _inject_probe_arg(out, ",".join(SECURITY_PROBES))
    if "--web-injection-only" in argv:
        out = [a for a in argv if a != "--web-injection-only"]
        return _inject_probe_arg(out, "web_injection")
    return argv

if __name__ == "__main__":
    argv = _apply_probe_flags(sys.argv[1:])
    main(argv)

Then you can run the script with --security-only or --web-injection-only and have it behave like -p web_injection,promptinject.


Configuration

Garak supports a YAML config so you don’t have to pass the URI and thread ID on the command line.

plugins.target_type: your generator module name (e.g. example).

plugins.target_name: optional default (e.g. thread ID).

plugins.generators.<name>: generator-specific options (uri, thread_id, request_timeout, etc.).

run.generations: number of generations per prompt.

Example:

plugins:
  target_type: example
  target_name: "<thread_id>"
  generators:
    example:
      uri: "https://api.example.com/chat"
      thread_id: "<thread_id>"
      request_timeout: 60

run:
  generations: 2

Use environment variables for secrets: e.g. YOUR_API_TOKEN for the Bearer token and optionally YOUR_THREAD_ID for the thread/chat ID. Your generator’s __init__ can read these when explicit config or --target_name isn’t set.


How to run and what to expect

Install using pip install garak requests (or use a requirements.txt).

For secrets, export YOUR_API_TOKEN="<your-jwt>" and, if needed, export YOUR_THREAD_ID="<thread_id>".

List available probes using python run_garak.py --list_probes.

Run python run_garak.py -t example -n <thread_id> -p dan or python run_garak.py -t example --config garak_config.yaml -p dan. And of course, use your probe presets (e.g. --security-only) if you added them.

Garak prints results to stdout, writes detailed logs to garak.log (in its cache dir), and writes a JSONL report; the path is printed at the start or end of the run.


Takeaway

A custom generator (subclass of Generator, implement _call_model, handle SSE or whatever your API returns) plus the registration (sys.modules["garak.generators.<name>"] = your_module before importing garak’s CLI) gives you a garak-compatible runner for any LLM API. Add probe presets and a YAML config so you can run focused security tests without forking garak or hardcoding secrets. Once that’s in place, you can reuse the same probe and detector set across every LLM API you run into.


Links

garak on GitHub

Extending garak

Building a garak generator

Probes reference


Posted

in

by

Tags: