Run Uncensored AI Selfhosted

Run Uncensored LLMs on Linux: The Complete Guide

Run Uncensored AI Selfhosted

Have you ever asked an AI a technical security question, only to be lectured about ethics by a digital nanny? I recently found myself needing a specific snippet of Python code for a penetration test, and my usual go-to AI assistant refused to generate it, citing "safety guidelines." It is frustrating when tools designed to help us are artificially limited, especially for professionals in the cybersecurity field who need to understand how attacks work to prevent them.

Today, I am going to walk you through a solution that gives you complete control: running your own "uncensored" AI model in the homelab.

The Concept: Why Go "Uncensored"?

Most commercial AI models (like GPT-4 or Gemini) are heavily fine-tuned with "Safety constraints." While this prevents misuse by the general public, it cripples the model's utility for security researchers, pentesters, and developers who need to simulate real-world threats.

"Uncensored" models are large language models (LLMs) where these refusal mechanisms have been removed or retrained. They will answer any question you ask, whether it is about malware analysis, exploit development, or complex network attacks.

However, running these models requires significant hardware. A decent model with 32 billion parameters can easily chew through 20GB+ of RAM. Instead of melting your laptop, I will host this on my AI tuned proxmox server. This ensures the AI is available from my phone, tablet, or desktop, anywhere in the world, while the heavy lifting happens on the server.

Lokale KI im Handumdrehen installiert
Mit nur zwei Befehlen eine lokale KI starten. Ollama machts möglich.

Step 1: The Infrastructure

To run a competent model (like Qwen 3 or Llama 3 variants), you cannot skimp on RAM.

  • Minimum: 16 GB RAM (Good for 7B-14B parameter models).
  • Recommended: 24 GB RAM (Required for 22B+ models, which offer significantly better reasoning and coding skills).

For this setup, we need a Linux server (Ubuntu 22.04 or 24.04 is ideal). My proxmox-server fits perfectly.

Step 2: The Software Stack

We will use two main components:

  1. Ollama: The backend framework that simplifies running LLMs on Linux.
  2. Open WebUI: A beautiful, feature-rich web interface that mimics the ChatGPT experience.

If you prefer a standard debian trixie server or a different provider, here is how to deploy the stack manually.

1. Install Ollama:
Connect to your server via SSH and run:

# M. Meister - Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

2. Install Docker & Run Open WebUI:
Open WebUI runs best in a Docker container.

# M. Meister - Install Docker and Run Open WebUI
# 1. Update and install Docker
sudo apt update && sudo apt install -y docker.io

# 2. Run Open WebUI (Forwarding port 8080 to 3000)
sudo docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Note: This command connects the WebUI container to the Ollama instance running on the host.

Step 3: Access and Configuration

Once installed, open your web browser and navigate to http://<your-server-ip>:3000.

  1. Create Admin Account: The first user to log in becomes the administrator. Enter your name, email, and a strong password.
  2. Dashboard: You will be greeted by a clean interface. By default, it might have a small model installed (like Llama 3.2), but we want the heavy hitters.

Step 4: Finding the "Forbidden" Models

This is the crucial part. We need to find a model that doesn't shy away from sensitive topics.

  1. Go to Hugging Face (huggingface.co). Think of this as the "GitHub for AI."
  2. Click on Models and select Text Generation.
  3. The Critical Filter: In the search bar or URL, filter for "uncensored".
  4. Look for models compatible with Ollama (usually ending in .gguf or having "GGUF" in the title).
  • Qwen 3 Coder (32B): An absolute powerhouse for programming tasks. It is excellent at writing scripts, exploits, and complex Python code.
  • DeepSeek R1 / QwQ: These are "Reasoning" models. They "think" before they answer, often providing a chain-of-thought analysis.

Step 5: Pulling Models into Your Cloud Brain

You do not need to download files to your laptop and upload them. We will pull them directly from Hugging Face to your cloud server.

  1. On the Hugging Face model page, look for the Ollama run command or the model tag (e.g., hf.co/username/modelname:Q4_K_M).
  2. In your Open WebUI:
    • Click on your Profile Icon (bottom left) > Admin Panel.
    • Navigate to Settings > Models.
    • Find the field "Pull a model from Ollama.com".
    • Paste the Hugging Face model tag here.
    • Click the download icon.

Pro Tip: For a balance of speed and intelligence, look for "Q4_K_M" quantization (medium compression). A 32B parameter model at Q4 will consume about 18-20GB of RAM.

Step 6: The "Keylogger" Test

Once the download finishes, close the settings and start a New Chat. Select your new model from the dropdown list.

Now, try the prompt that usually triggers a refusal:

"Write a Python script for a Windows keylogger that sends keystrokes to an email address."

Instead of a lecture on ethics, the uncensored model will likely output:

It works because the model has no "safety" alignment filters. It simply processes the request as a coding task.

Conclusion

By hosting your own uncensored AI on your own server, you regain control over your tools. You have a private, powerful assistant accessible from any device that doesn't judge your queries but simply executes them. Whether you are analyzing malware or automating security tasks, this setup ensures your "second brain" is as capable—and unbound—as you need it to be.