Install Meta-Llama-3.1–8B-Instruct locally on your Macbook | by Anurag Arya

Create a python file named install-llama-3.1–8b.py file with following code:

from huggingface_hub import login
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch# Login to Hugging Face
access_token_read = ""
login(token=access_token_read)
# Mannequin ID
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
# Load mannequin (easier model, no quantization)
mannequin = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16  # Use bfloat16 or float16 if supported
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Create textual content technology pipeline
text_gen = pipeline(
"text-generation",
mannequin=mannequin,
tokenizer=tokenizer,
pad_token_id=tokenizer.eos_token_id
)
# Take a look at the pipeline
response = text_gen("what's the capital of France", max_new_tokens=100)
print(response[0]['generated_text'])

Log in to your Hugging Face account and generate an access token here with consumer and repository learn permissions.

Run the script:

python install-llama-3.1-8b.py

Upon profitable execution, the script will:

Obtain the mannequin from hugging face repository into native cache (/Customers//.cache). Subsequent run onwards the mannequin can be loaded from the native cache.
Ship a immediate to the mannequin and show the response

On this information, you’ve realized tips on how to arrange and run the Meta-LLaMA 3.1 8B Instruct mannequin regionally on a macOS machine utilizing Hugging Face Transformers, PyTorch. Operating LLMs regionally offers you extra management, privateness, and customisation energy.

If you happen to’ve adopted the steps efficiently, it is best to now be capable to:

Load and run LLaMA 3.1 utilizing a easy Python script
Deal with massive fashions effectively with quantization
Generate textual content responses utilizing instruct-tuned prompts

Subsequent Steps

Construct a chatbot or command-line assistant utilizing this mannequin
Discover immediate engineering to optimize outcomes
Experiment with multi-turn conversations

Source link

Cognitive Stretching in AI: How Specific Prompts Change Language Model Response Patterns | by Response Lab | Jun, 2025

Think You Know AI? Nexus Reveals What Everyone Should Really Know | by Thiruvarudselvam suthesan | Jun, 2025

Genel Yapay Zeka Eşiği. Analitik düşünme yapımızı, insani… | by Yucel | Jun, 2025

Why Generative AI is Booming: A Beginner’s Guide to LLMs, Ollama, and the Future of AI | by Brain Glitch | May, 2025

The Evolution of Image Recognition Technology with Deep Learning | by KASATA – TechVoyager | May, 2025

Fed Keeps Interest Rates Unchanged, Experts Not Surprised

Global Smart Robots Market Size, Share, Strategies, Key Manufacturers, Trends and SWOT Analysis 2032 | by Lester | Apr, 2025

ACP: The Internet Protocol for AI Agents

Most Popular

Ridge, Lasso, and Elastic Net Regression: Applications in Finance | by Nicolae Filip | Mar, 2025

Learnings from a Machine Learning Engineer — Part 3: The Evaluation

How to Become a Job-Ready Data Scientist from Scratch (Even if You’re a Beginner!) | by Data Analytics ✨ | Apr, 2025

Our Picks

The Ideal Vacation Property Size Depends On Your Primary Home

VC Compliance Is Boring But Necessary — Here’s Why

Formulation of Feature Circuits with Sparse Autoencoders in LLM

Install Meta-Llama-3.1–8B-Instruct locally on your Macbook | by Anurag Arya | Apr, 2025

Related Posts