LLAMA installation

Requirements:

  1. Miniconda: To set up the environment for running your Large Language Model, you can utilize Miniconda. Follow the installation instructions from Miniconda Documentation.

  2. Python: Ensure you have the latest version of Python installed. You can download and install it from the official Python website.

  3. Homebrew Installer: Install Homebrew, which is a package manager for macOS and Linux. Visit brew.sh and follow the provided instructions for installation.

  4. LLMA Models: You will need the Large Language Model Architecture (LLMA) models. These can be downloaded from the Model Exchange and Training API (META) with approval. Alternatively, you can access a community-driven repository like Hugging Face. Here's an example model I used:

    Choose a model based on your hardware capabilities.

 

Installation steps:

  1. Create Conda Environment:

LLAMA better works with an environment

conda create --name llama.cpp python=3.11 conda activate llama.cpp
  1. Install Git LFS:

GIT doest not support larger file downloads, GIT LFS helps GIT download these large files

brew install git-lfs git lfs install
  1. Clone Repository:

git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp
  1. Install Python Dependencies:

  1. Build C++ Tools:

This is a c++ project, Inorder to build tools from c++ use command

  1. Clone Models Repository:

This step will take time because of larger sizes of models.

  1. Quantize Models:

In order to quantize our models you will have to use convert.py script which comes in llama package.
Each command will create a better and efficient model.

If you would go to /llama/models directory you could see your orginal files like pytorch_model-00001-of-00002.bin, which comes with repository and we have quantazied them in two version ggml-model-f16.gguf and more smaller version ggml-model-q8_0.gguff if would compare both files size you could see the difference

  1. Run Benchmark:

Do a benchmark test for each model that we have created and see the difference of memory and gpu usage.

  1. Run Server Locally: