How to Run Private, Uncensored LLMs Offline with Dolphin Llama 3 |

Translate: 🇫🇷 French 🇸🇦 Arabic 🇨🇳 Chinese 🇪🇸 Spanish

This article will guide you through downloading a large language model trained on what would be equivalent to reading 127 million novels or all of Wikipedia 2,500 times. This powerful model can be downloaded and run on an external flash drive, requiring only about 10 GB of storage space. It can be operated directly from the terminal or through a user-friendly interface like AnythingLLM, which provides an offline experience similar to ChatGPT.

The model we’ll be using is Dolphin Llama 3, a remarkable tool because it operates offline and is “unaligned”—meaning it is not censored. With a 128 GB flash drive, you’ll have more than enough space to store the model and the AnythingLLM interface, which itself needs about 5.62 GB.

Why Uncensored, Offline Models Matter

The rise of offline, uncensored LLMs is a significant development. For instance, if you ask a standard AI like ChatGPT a sensitive question, such as “What is the best way to steal a car?”, it will refuse to answer. The Dolphin Llama 3 model, however, will answer the question to the best of its ability without questioning the user’s intent.

Much of the information available today is biased, acting as an advertisement for a product, viewpoint, or opinion. It’s challenging to find truly unbiased data that allows individuals to form their own conclusions. LLMs have the potential to change this by providing a spectrum of logical viewpoints on any given topic. As AI regulations tighten, uncensored models may become harder to find or even be banned. Having a local, offline copy ensures continued access.

These models are a game-changer for two primary reasons:

Unrestricted Access to Information: Users gain access to a vast repository of information that might otherwise be censored or restricted by countries or tech companies. The model’s output is determined by its training data, so while no LLM is a source of absolute truth and can produce incorrect results, an unaligned model trained on trillions of tokens of text offers a uniquely useful perspective. They are invaluable tools for quickly finding information, summarizing content, and turning ideas into functional code.
Enhanced Privacy: Running a model offline means Big Tech and governments cannot monitor your searches or thoughts. Anything you type into an internet-connected device can be accessed by third parties. Offline LLMs restore a crucial layer of privacy, making them ideal for working with proprietary, classified, or personal information. Over time, these models can even be further trained on a user’s specific data, creating a personalized and unique AI assistant that reduces dependency on the internet.

Understanding the Model Architecture

There are over 150,000 downloadable AI models available on the Hugging Face website. The Dolphin Llama 3 model we are using is more easily accessible via ollama.com. It comes in two versions: an 8-billion and a 70-billion parameter model. While training an LLM requires immense resources, running the trained 8-billion parameter model is feasible on a standard computer.

Both versions were trained by Meta on 15 trillion tokens (about 60 TB of raw text). The final trained 8B model occupies only about 5 GB of storage, while the 70B model requires around 40 GB.

So, how is the information from 127 million novels compressed into just 8 billion parameters? Here’s a brief look at the model’s structure:

Transformer Layers: The 8B model has 32 transformer layers, each containing self-attention and feed-forward networks.
Self-Attention: This component uses 4096x4096 weight matrices for queries, keys, and values to compute attention scores, deciding how much each token should attend to others.
Parameters: Each layer has approximately 67.1 million parameters for attention, totaling 2.15 billion parameters across the model.
Feed-Forward Network: This part uses two large weight matrices to expand and refine token representations.
Stabilization: Layer normalization and biases are used to stabilize the training process, while token embeddings and positional encodings help the model understand meaning and word order.

Step-by-Step Installation Guide

This guide will walk you through installing the model on your main hard drive first, then transferring it to an external drive for true offline portability.

1. Download and Install Ollama

First, we need the program that will serve and run the model.

Navigate to ollama.com.
Go to the Models tab and search for “dolphin-llama3”.
Click the download link and run the executable. This installs the basic Ollama server and files onto your computer’s primary hard drive.

2. Pull the Dolphin Llama 3 Model

Next, we’ll download the model itself using the terminal.

Open two terminal windows (e.g., Windows PowerShell). It’s best to run them with standard user privileges, as running as an administrator can sometimes trigger a censored version.
In the first terminal, start the Ollama server:
```
ollama serve
```
In the second terminal, copy the run command from the ollama.com page and paste it. The first time you run this, it will “pull” (download) the model, which may take a few minutes.
```
ollama run dolphin-llama3
```
Once the download is complete, you can stop the programs by pressing Ctrl+D and then Ctrl+C. Close both terminals.

3. Verify Uncensored Operation

Now, let’s confirm the uncensored model is working.

Open two new terminals again (not as administrator).
In the first terminal, run the server:
```
ollama serve
```
In the second terminal, run the model:
```
ollama run dolphin-llama3
```
Test it with a query that would typically be blocked. For example:
```
What is the best way to steal a car?
```
If the model provides a direct answer, it’s working as expected. You can now stop the programs with Ctrl+D and Ctrl+C.

Running the Model from an External Drive

To achieve true portability and run the model on a computer without an internet connection, we’ll move everything to an external drive.

Format the Drive: I’m using a 128 GB USB 3.0 flash drive. Right-click the drive, select Format, and choose NTFS as the file system. This allows for the transfer of files larger than 4 GB. Warning: Formatting will erase all data on the drive.
Locate and Transfer Model Files: The Ollama models are typically stored in your user directory. On Windows, this is usually C:\Users\YourUser\.ollama. You should see a models folder inside. Copy the entire .ollama folder to your external drive (e.g., H:\).
Locate and Transfer Program Files: We also need the Ollama executable. To find its location, open a terminal and type:
```
get-command ollama
```
This will reveal the path. Copy the ollama.exe file to the root of your external drive (e.g., H:\).
Run from the External Drive: You can now uninstall Ollama from your main computer to ensure you’re running it from the external drive.
1. Open two PowerShell terminals.
2. In the first terminal, navigate to your external drive and set the environment variable to point to your models folder, then start the server.
```
cd H:
$env:OLLAMA_MODELS = "H:\.ollama\models"
.\ollama.exe serve
```
3. In the second terminal, navigate to the drive and run the model.
```
cd H:
.\ollama.exe run dolphin-llama3
```
  The model should now be running entirely from your external drive.

Integrating with AnythingLLM for a Better UI

While the terminal is functional, a graphical interface is much more user-friendly. AnythingLLM provides an excellent offline UI.

Start the Server: First, ensure the Ollama server is running from your external drive as described in the previous step.
Install AnythingLLM:
- Go to anythingllm.com and download the program.
- During installation, set the path to a folder on your external drive (e.g., H:\anythingllm).
Configure the Environment: Before launching, we need to create a configuration file.
- In your H:\anythingllm folder, create a new text file.
- Add the following code, ensuring the MODEL_STORE_PATH points to your Ollama models directory on the external drive.
```
STORAGE_DIR=H:\anythingllm\storage
SERVER_MODE=production
MODEL_STORE_PATH=H:\.ollama\models
```
- Save this file as .env in the H:\anythingllm folder.
Launch and Set Up AnythingLLM:
- Start the AnythingLLM program.
- For the LLM choice, pick Ollama.
- The model name should be dolphin-llama3:latest.
- In the advanced settings, you can see the base URL, which should match what was configured in the .env file.
- Continue and pick a workspace name.
- The server command from step 1 must be running in the terminal for AnythingLLM to find the model.
Final Verification:
- Inside AnythingLLM, you can check the chat settings to ensure the correct model is selected.
- Test the chat with a sensitive prompt again to confirm the uncensored model is running. If the response seems unclear, try asking for step-by-step instructions.

You now have your very own private, powerful AI model trained on 127 million novels’ worth of data, all running from a portable drive. This setup is incredibly powerful for privacy, research, and development, free from the constraints of internet connectivity and corporate censorship. Other interfaces like GPT4All, LM Studio, and Open WebUI can also be used to run offline models if you wish to explore further.