Mastering DeepSeek-R1: The Ultimate Guide to Local Installation with Ollama, vLLM, and Transformers

Artificial intelligence (AI)

Mastering DeepSeek-R1: The Ultimate Guide to Local Installation with Ollama, vLLM, and Transformers

DeepSeek-R1 is a groundbreaking open-source reasoning model redefining AI capabilities. With its advanced Mixture of Experts (MoE) architecture and efficient parameter activation, it combines power, scalability, and cost-effectiveness. This guide provides a step-by-step walkthrough for setting up DeepSeek-R1 locally using GPU-powered tools like Ollama, vLLM, or Transformers. Perfect for solving co

DeepSeek-R1 has emerged as a revolutionary open-source reasoning model, making a significant impact in the AI community. With its advanced features, it rivals industry giants like OpenAI’s models, delivering exceptional performance without the premium cost. Built on a Mixture of Experts (MoE) architecture, DeepSeek-R1 boasts an impressive 671 billion parameters but cleverly activates only 37 billion during each forward pass. This innovative design ensures an optimal balance of efficiency, scalability, and power. Furthermore, its unique reinforcement learning (RL) methodology enables advanced capabilities like chain-of-thought reasoning, self-verification, and reflection—making it ideal for complex problem-solving in math, coding, and logical reasoning.

In this guide, we’ll explore three simple methods for installing DeepSeek-R1 locally on your machine or virtual environment. By the end, you’ll have a clear understanding of the setup process and which approach best suits your needs.

Prerequisites

Before diving into the installation steps, ensure your system meets the following minimum requirements:

Disk Space: 500 GB (requirements may vary depending on the model version).
Software: Jupyter Notebook or Nvidia CUDA (for GPU acceleration).
Hardware: A GPU-compatible machine. Refer to the table below for specific GPU configurations based on the model type:

Model Type	Recommended GPU	RAM
Base Model	RTX 3090	24 GB
Advanced Model	A100	40 GB
Full Model	H100	80 GB

Installation Methods

We’ll outline the steps for setting up DeepSeek-R1 locally using a GPU-powered Virtual Machine (VM). For this tutorial, we’ll use NodeShift—a cloud provider known for its affordable and scalable GPU instances. However, these steps can be adapted to other platforms as well.

Step 1: Create a NodeShift Account

Go to NodeShift’s website.
Sign up using your email, or use your Google/GitHub account for quick registration.
Once registered, log in to access your dashboard.

Step 2: Create a GPU Node

From the dashboard, navigate to the menu on the left.
Select GPU Nodes and click on Start to create a new GPU-powered virtual machine.
Configure your node by selecting the desired GPU type, storage, and region. For example:
- GPU: RTX 4090 (suitable for most setups)
- Storage: 700 GB (adjust as needed)
- Region: Choose a region closest to your location for optimal performance.

Step 3: Set Up the Environment

Once your GPU node is live, you’re ready to install DeepSeek-R1.

Connect to Your Node:
- Use SSH to access your virtual machine. For example:
```
ssh user@your-node-ip
```

Install Dependencies:

Update your system:
```
sudo apt update && sudo apt upgrade -y
```

Install Python and essential libraries:

sudo apt install python3 python3-pip -y
pip3 install jupyter transformers torch

Step 4: Download and Configure DeepSeek-R1

Clone the DeepSeek-R1 repository:

git clone https://github.com/deepseek-ai/deepseek-r1.git

Navigate to the project directory:
```
cd deepseek-r1
```
Install model-specific requirements:
```
pip3 install -r requirements.txt
```
Download the model weights (ensure adequate storage):
```
python3 download_weights.py --model deepseek-r1
```

Step 5: Run DeepSeek-R1

You’re now ready to run the model. Start a Jupyter Notebook server or directly execute scripts:

Launch Jupyter Notebook:
```
jupyter notebook
```

Test the model in a Python script:

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("deepseek-r1")
tokenizer = AutoTokenizer.from_pretrained("deepseek-r1")

input_text = "Solve this math problem: What is 12 + 47?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Additional Notes and Tips

Using Ollama or vLLM: Both Ollama and vLLM offer optimized deployment environments for DeepSeek-R1. If you prefer these over Transformers, follow their specific documentation for installation.
Scaling Up: For large-scale applications, consider deploying multiple GPU nodes and using a load balancer to distribute requests.
Troubleshooting: Common errors can often be resolved by:
- Ensuring all dependencies are installed.
- Checking GPU compatibility and driver versions.
- Consulting the official DeepSeek-R1 documentation.

By following this guide, you’ll have DeepSeek-R1 up and running locally, ready to tackle advanced reasoning tasks. Whether you’re solving mathematical problems, writing code, or conducting research, this powerful model will undoubtedly elevate your projects to new heights.

4 min read

Jan 28, 2025

By Cristian Sas

Your email address will not be published. Required fields are marked *

Comment

Name

Website

Save my name, email, and website in this browser for the next time I comment.