🧠 AI TRAINING & METHODOLOGY

How Our AI Learns

Discover how we're building Ghana's most comprehensive legal AI assistant through community collaboration and cutting-edge machine learning

1,655
Training Examples
Question-Answer Pairs
LLAMA 3.2
Base Model
Meta's Latest LLM
LoRA
Fine-Tuning Method
Efficient Adaptation
DPO
Optimization
Preference Learning

🎯 Our Training Process

1

Initial Training Dataset

We started with 1,655 carefully curated question-answer pairs covering Ghanaian laws, regulations, and legal procedures. This foundational dataset was compiled from official legal documents, court cases, and verified legal resources specific to Ghana.

2

Base Model & Fine-Tuning

We started with Meta's LLAMA 3.2, a state-of-the-art large language model. To efficiently adapt it to Ghanaian law, we used:

LoRA (Low-Rank Adaptation)

An efficient fine-tuning technique that updates only a small subset of model parameters, making training faster and more resource-efficient while maintaining high quality.

Unsloth

A cutting-edge optimization library that accelerates training by up to 2x, allowing us to fine-tune the model more quickly and cost-effectively.

This combination allowed us to create a specialized legal assistant for Ghana without requiring massive computational resources.

3

Direct Preference Optimization (DPO)

This is where YOU come in! For each question, our system generates two different responses. When you select the better one (and optionally edit it), you're creating valuable training data that teaches the model:

  • Which responses are more helpful and accurate
  • What tone and style users prefer
  • How to better structure legal explanations
  • Common areas where the model needs improvement
4

Continuous Improvement

Your feedback is collected and used to periodically retrain the model. Each training cycle makes the AI smarter, more accurate, and better aligned with what users actually need. This creates a virtuous cycle of improvement.

💡 Why Direct Preference Optimization?

✓ More Natural

Instead of complex reward models, DPO directly learns from human preferences, making the training process more intuitive and effective.

✓ Community-Driven

Every user contributes to making the model better, creating a truly collaborative AI that serves the community's needs.

✓ Faster Learning

DPO allows the model to learn more efficiently from each piece of feedback, leading to rapid improvements over time.

✓ Better Alignment

The model learns to align with human values and preferences, producing responses that are not just accurate but also helpful and appropriate.

📚 What Our Dataset Covers

⚖️ Constitutional Law
🏢 Business & Corporate Law
🏠 Property & Land Law
👨‍👩‍👧 Family Law
💼 Employment Law
🚗 Traffic & Road Laws

Ready to Help Us Improve?

Every question you ask and every preference you share makes our AI smarter and more helpful for everyone in Ghana.