LLAMA installation
Requirements:
Miniconda: To set up the environment for running your Large Language Model, you can utilize Miniconda. Follow the installation instructions from Miniconda Documentation.
Python: Ensure you have the latest version of Python installed. You can download and install it from the official Python website.
Homebrew Installer: Install Homebrew, which is a package manager for macOS and Linux. Visit brew.sh and follow the provided instructions for installation.
LLMA Models: You will need the Large Language Model Architecture (LLMA) models. These can be downloaded from the Model Exchange and Training API (META) with approval. Alternatively, you can access a community-driven repository like Hugging Face. Here's an example model I used:
Model: OpenHermes-2.5-Mistral-7B
Choose a model based on your hardware capabilities.
Â
Installation steps:
Create Conda Environment:
LLAMA better works with an environment
conda create --name llama.cpp python=3.11
conda activate llama.cpp
Install Git LFS:
GIT doest not support larger file downloads, GIT LFS helps GIT download these large files
brew install git-lfs
git lfs install
Clone Repository:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
Install Python Dependencies:
Build C++ Tools:
This is a c++ project, Inorder to build tools from c++ use command
Clone Models Repository:
This step will take time because of larger sizes of models.
Quantize Models:
In order to quantize our models you will have to use convert.py script which comes in llama package.
Each command will create a better and efficient model.
If you would go to /llama/models directory you could see your orginal files like pytorch_model-00001-of-00002.bin, which comes with repository and we have quantazied them in two version ggml-model-f16.gguf and more smaller version ggml-model-q8_0.gguff if would compare both files size you could see the difference
Run Benchmark:
Do a benchmark test for each model that we have created and see the difference of memory and gpu usage.
Run Server Locally: