Parameter Efficient FineTuning In Action: Finetuning LLMs Using PEFT & LoRA For Causal Language Modeling Task

Hands-on Code Generation Implementation using Codegen pre-trained model- Parameter Efficient Fine-Tuning — LoRA - CausalLM

Introduction

In our ever-evolving AI landscape, the excitement around Language Models is palpable. Yet, as models grow in size, so do the challenges tied to fine-tuning them. How do we efficiently adapt colossal models to specific tasks without extensive computational costs? Welcome to a deep dive into the world of Parameter-Efficient Fine-Tuning! Today, we'll unravel the mystique around fine-tuning Large Language Models (LLMs) using cutting-edge techniques like PEFT and LoRA. If code generation and language modeling intrigue you, strap in for a hands-on walkthrough with the Codegen pre-trained model. By the end of this journey, not only will you grasp the nuances of these techniques, but you'll also have a clear road map to implement them in your projects.

Workflow of the code

LoRa
PEFT
Causal Language Modeling
Codegen
Installing dependencies
Loading the required libraries
Loading the based pre-trained model for casual language modeling
Loading the tokenizer
initializing LoRa configuration
Loading the dataset from hugging face
Splitting the dataset into train and val
Defining function to Tokenize and process prompt template
Tokenizing train and val dataset into tensor acceptable by the trainer
Defining the metric function
Initializing seed for reproducibility
Initializing the trainer's arguments and the trainer
Training the based pre-trained model
Saving the finetuned model and its tokenizer
Loading the finetuned model
Defining inference function
Crafting 3 prompt templates
Testing

LoRa

LoRa stands for Low Rank Adaptation of Large Language Models. It is a technique that accelerates the fine-tuning of large models while consuming less memory. To make fine-tuning more efficient, LoRA’s approach is to represent the weight updates with two smaller matrices (called update matrices) through low-rank decomposition. These new matrices can be trained to adapt to the new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn’t receive any further adjustments. To produce the final results, both the original and the adapted weights are combined.

This approach has several advantages:

LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
Lora is orthogonal to many other parameter-efficient methods and can be combined with many of them.
The performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
LoRA does not add any inference latency because adapter weights can be merged with the base model.

LoRa is implemented in the Hugging Face Parameter Efficient Fine-Tuning (PEFT) library. To fine-tune a model using LoRA, you need to:

Instantiate a base model.
Create a configuration (LoraConfig) where you define LoRA-specific parameters.
Wrap the base model with get_peft_model() to get a trainable PeftModel.
Train the PeftModel as you normally would train the base model.

Parameter-Efficient Fine Tuning (PEFT)

PEFT is a method used to freeze the pre-trained model parameters during fine-tuning and add a small number of trainable parameters (the adapters) on top of it. The adapters are trained to learn task-specific information. This approach is very memory-efficient with lower compute usage while producing results comparable to a fully fine-tuned model.

Causal Language Model

Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left. This means the model cannot see future tokens. GPT-2 is an example of a causal language model. They are frequently used for text generation. You can use these models for creative applications like choosing your text adventure or an intelligent coding assistant like Copilot.

What is Codegen

CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython. The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen model checkpoints are available on different pre-training data with variable sizes. The format is: Salesforce/codegen-{size}-{data}, where:

size: 350M, 2B, 6B, 16B
data:
- nl: Pre-trained on the Pile
  
  multi: Initialized with nl, then further pre-trained on multiple programming languages data
  
  mono: Initialized with multi, then further pre-trained on Python data
For example, Salesforce/codegen-350M-mono used in this tutorial offers a 350 million-parameter checkpoint pre-trained sequentially on the Pile, multiple programming languages, and Python.

Installing dependencies

The dependencies needed in this tutorial are: bitsandbytes, datasets accelerate, loralib, peft and Transformers. We install them using pip as shown below. To run this code you need to change your colab runtime to T4 GPU and enable it. Besides, we use bitsandbytes because it supports 8bit and 4bit precision data types, which are useful for loading large models to save memory.

!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git

Importing the libraries

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
import bitsandbytes as bnb
from datasets import Dataset
from datasets import load_metric
import transformers
import torch
import numpy as np
import random
import torch.nn as nn
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

Initializing the based pre-trained model

model = AutoModelForCausalLM.from_pretrained(
    "Salesforce/codegen-350M-mono",
    torch_dtype=torch.float16,
    device_map='auto',
    load_in_8bit=True
)

Let's break down each line:

AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-mono"): This function is responsible for loading a pre-trained model. Here's what each part means:
AutoModelForCausalLM: Refers to the general architecture of the model being loaded. In this case, it's a "causal language model", which is a type of model designed for tasks like text generation. "Causal" here means that the model predicts the next word in a sequence based only on previous words, not future words. -from_pretrained: This function tells the library to load a model that has already been trained (i.e., pre-trained) rather than starting from scratch.
"Salesforce/codegen-350M-mono": This is the identifier for the specific pre-trained model you want to load. In this case, you're loading a model from Salesforce with the identifier codegen-350M-mono. The naming often provides hints about the model; here, it suggests the model may be designed for code generation (codegen) and has around 350 million parameters (350M). mono might hint at it being monolingual, but without further details, this is speculative.
torch_dtype=torch.float16: This sets the data type of the model's parameters. By using a torch.float16 (also known as "half precision"), the model will consume less memory and potentially run faster than using the default torch.float32. However, using half-precision can sometimes result in a slight decrease in model accuracy. It's a trade-off between speed/memory and accuracy.
device_map='auto': This is directing the model to be loaded on the most appropriate computational device available. If you have a GPU available, the library will automatically use it, which can greatly accelerate model computations. If no GPU is available, the model will default to CPU.
Initializing the tokenizer

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-mono")

AutoTokenizer: This is a class in the transformers library that can automatically load the appropriate tokenizer for a given pre-trained model. A tokenizer is responsible for converting human-readable text into a format that the model can understand (typically a sequence of integers) and vice-versa.

tokenizer.add_eos_token = True
tokenizer.pad_token_id = 0
tokenizer.padding_side = "left"

tokenizer.add_eos_token = True:
- This line tells the tokenizer to automatically add an "end of sentence" (EOS) token at the end of every sequence it tokenizes. In many transformer models, the EOS token is used to signal the conclusion of an input sequence. By setting add_eos_token to True, it ensures that the EOS token is added whenever you tokenize a piece of text using this tokenizer.
tokenizer.pad_token_id = 0:
- Padding is used in machine learning models, especially in sequence models like transformers, to ensure that all sequences (e.g., sentences) in a batch have the same length. This is important because the underlying computations in neural networks usually require consistent input shapes.
- This line sets the identifier of the padding token to 0. This means when the tokenizer adds padding tokens to a sequence to make it match the desired length, it will use the token with ID 0 as the padding token.
tokenizer.padding_side = "left":
- When adding padding tokens, we can either add them to the start (left) or the end (right) of a sequence. This line specifies that the padding should be added to the start (left) of each sequence. This can be important in certain models or applications where the positioning of padding might influence the model's understanding of the sequence.

Freezing the model parameters

for param in model.parameters():
  param.requires_grad = False  
  if param.ndim == 1:

    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable() 
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

Breaking the code down

      for param in model.parameters():
          param.requires_grad = False

This loop iterates through all the parameters of the model and sets their requires_grad attribute to False. When a parameter's requires_grad attribute is set to False, it will not update during the backward pass, meaning it remains "frozen" during training. This is useful when you only want to train certain parts of a model or when fine-tuning a new dataset.

Cast 1-dimensional Parameters to float32

  if param.ndim == 1:
      param.data = param.data.to(torch.float32)

For 1-dimensional parameters (often biases or parameters in normalization layers), the code changes its data type to float32. This can be helpful for stability in training since smaller data types (like float16) can sometimes cause numerical issues, especially for parameters like biases.

Enabling Gradient Checkpointing:

  model.gradient_checkpointing_enable()

Gradient checkpointing is a technique used to save memory when training very deep models. Instead of storing all intermediate activations in memory for the backward pass, it recomputes them, trading off computation time for memory. This can be particularly useful when training models on GPUs with limited memory.

Enabling Input Requirement for Gradients:
```
  model.enable_input_require_grads()
```

This method likely ensures that the input to the model requires gradients. This can be useful when you're interested in calculating gradients concerning the input, such as in adversarial training.

Creating a Custom Module to Cast Output to float32:

  class CastOutputToFloat(nn.Sequential):
      def forward(self, x): return super().forward(x).to(torch.float32)
  model.lm_head = CastOutputToFloat(model.lm_head)

This custom module, CastOutputToFloat, is derived from PyTorch's nn.Sequential class. It overrides the forward method to cast its output to float32. The final line replaces the lm_head of the model with this custom module wrapping the original lm_head. The purpose is likely to ensure that the final predictions (logits) from the model are in the float32 data type, which can be helpful for precision and stability reasons, especially if other parts of the model or training process utilize lower precision formats like float16.

Printing the number of trainable parameters in the model.

def print_trainable_parameters(model):

    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

Initializing the LoRa configuration

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["fc_in"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 819200 || all params: 357531648 || trainable%: 0.2291265695170012

r: This parameter sets the rank of the low-rank adaptation. Essentially, it determines the size of the adaptation parameters. A smaller r means fewer parameters, making the adaptation more parameter-efficient. In this case, it's set to 8.
lora_alpha: This parameter is a scaling factor. LoRA introduces an additional linear layer to existing layers in the model, and lora_alpha helps determine the size of this layer. Specifically, for a layer with d units, the LoRA layer would have d/lora_alpha units. In this configuration, lora_alpha is set to 16, meaning the adaptation layer will be scaled down by this factor compared to the original layer.
target_modules: This is a list indicating which modules (or layers) of the model should be adapted using LoRA. In this case, only the module named "fc_in" is set to be adapted.
lora_dropout: Specifies the dropout rate to be applied to the outputs of the LoRA layers. Dropout is a regularization technique where, during training, random neurons (or outputs) are "dropped out" or set to zero. Here, a dropout rate of 0.05 indicates that 5% of the neurons will be set to zero during each forward pass.
bias: Defines how biases should be handled in the LoRA-adapted layers. Here, it's set to "none", implying that no biases will be used in the LoRA layers.
task_type: Specifies the type of task the model is intended for. In this case, it's set to "CAUSAL_LM", indicating a causal language modeling task. Causal Language Modeling (CLM) is where the model predicts the next token in a sequence based solely on the previous tokens, as opposed to masked language modeling where the model predicts masked-out tokens based on their context.

Loading the dataset from hugging face

dataset = load_dataset("theblackcat102/evol-codealpaca-v1")

The code downloads the evol-codealpaca-v1 dataset from the Hugging Face datasets hub and stores it in the dataset variable for further use.

Splitting the dataset into train and validation

def split_dataset(dataset):
    n = int(0.8 * len(dataset['train']))
    train_data = dataset['train'][:n]
    val_data = dataset['train'][n:]

    dataset['train'] = train_data
    dataset['validation'] = val_data

    return dataset['train'], dataset['validation']

the split_dataset function is used to split the training data of a given dataset into training (80%) and validation (20%) sets. This is a common practice in machine learning to ensure that a separate set of data is available to validate the model's performance after training.

train_data, val_data = split_dataset(dataset)

The returned tuple from the split_dataset function is unpacked into two separate variables. The first value (the training set) is assigned to the train_data variable, and the second value (the validation set) is assigned to the val_data variable.

After this line of code executes, you'll have:

train_data: This contains 80% of the original training data from the dataset.
val_data: This contains the remaining 20% of the original training data, and it will be used for validation purposes.
Function definition to tokenize and preprocess prompt template

def tokenize_function(samples):
    output_str = samples['output'] if samples['output'] else "Cannot Find Answer"
    prompt_template = f"### INSTRUCTION\n{samples['instruction']}\n\n### OUTPUT\n{output_str}</s>"
    return tokenizer(prompt_template, truncation=True, padding='max_length', max_length=2048)

The function accepts a single argument, samples, which is expected to be a dictionary containing at least two keys: instruction and output.

output_str = samples['output'] if samples['output'] else "Cannot Find Answer"

This line checks if samples['output'] exist and is not empty or None. If it does have a value, that value is assigned to the output_str variable. If not, the string "Cannot Find Answer" is assigned to output_str. This is a form of conditional assignment in Python and ensures that the output_str always has some value.

 prompt_template = f"### INSTRUCTION\n{samples['instruction']}\n\n### OUTPUT\n{output_str}</s>"

This line creates a formatted string, prompt_template, using the values from the samples dictionary. It follows a specific format where the instruction and output are both clearly labeled. Notice the use of the token at the end; this is often used as an end-of-sequence token in certain tokenization schemes.

return tokenizer(prompt_template, truncation=True, padding='max_length', max_length=2048)

This line utilizes the tokenizer (which is expected to be available in the outer scope) to tokenize the prompt_template. The tokenizer is set to truncate sequences if they exceed 2048 tokens and pad shorter sequences to this length.

Formatting the train and validation dataset into tensor data acceptable by the trainer

train_dataset = Dataset.from_dict(train_data)
mapped_train_dataset = train_dataset.map(tokenize_function, batched=False, remove_columns=['instruction', 'output'])
val_dataset = Dataset.from_dict(val_data)
mapped_val_dataset = val_dataset.map(tokenize_function, batched=False, remove_columns=['instruction', 'output'])

Here's a step-by-step breakdown:

Dataset Initialization:

train_dataset = Dataset.from_dict(train_data)

This line creates a Dataset object from the train_data dictionary. The Dataset object is a data structure provided by the datasets library that is optimized for large-scale datasets and ML tasks. It enables efficient data processing methods and various utilities.

Mapping and Tokenizing Train Dataset:

mapped_train_dataset = train_dataset.map(tokenize_function, batched=False, remove_columns=['instruction', 'output'])

The map method applies a given function (in this case, tokenize_function) to each sample in the dataset.

batched=False: This means the tokenize_function will be applied to individual samples rather than batches of samples.
remove_columns=['instruction', 'output']: After processing each sample with tokenize_function, the original columns 'instruction' and 'output' are removed, since they've been tokenized and formatted and are no longer needed in their raw form.
Dataset Initialization for Validation Data:

val_dataset = Dataset.from_dict(val_data)

Similarly, a Dataset object for validation data (val_data) is created.

Mapping and Tokenizing Validation Dataset:

mapped_val_dataset = val_dataset.map(tokenize_function, batched=False, remove_columns=['instruction', 'output'])

Just like with the training data, the validation data is also processed using the tokenize_function. The processed and tokenized data is stored in the mapped_val_dataset object.

Defining the function to compute the metrics

transformers.logging.set_verbosity_info()
bleu_metric = load_metric("bleu")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = [tokenizer.decode(pred, skip_special_tokens=True) for pred in predictions]
    decoded_labels = [tokenizer.decode(label, skip_special_tokens=True) for label in labels]
    bleu_score = bleu_metric.compute(predictions=decoded_preds, references=decoded_labels)

    return {"bleu": bleu_score["bleu"]}

This function is designed to be used during or after the evaluation of a model to compute the BLEU score:

eval_pred is a tuple containing the predictions from the model and the true labels.
The predictions and labels, which are tokenized sequences, are first decoded into human-readable text using the tokenizer.decode() function. This is necessary because the BLEU metric works on actual text, not tokenized sequences.
The BLEU score is then computed using bleu_metric.compute(), and the result is returned as a dictionary with a single key-value pair: "bleu": bleu_score["bleu"].
Initializing seeding for model reproducibility

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

This part of the code sets various random seeds to ensure reproducibility. When training neural networks, many operations have a random component. By fixing the random seed, the same sequence of random numbers will be generated every time, leading to consistent results across runs. This is important if you want to ensure that someone else running your code, or you running your code at a later time, will get the same results.

Trainer initialization

trainer = transformers.Trainer(
    model=model,
    train_dataset=mapped_train_dataset,
    compute_metrics=compute_metrics,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False

The Trainer class from the Transformers library is being initialized. It's designed to simplify the process of training, evaluating, and testing transformer models.

Arguments to the Trainer:

model: The model instance you intend to train. This is typically a pre-initialized or pre-trained instance of a transformer model from the library.
train_dataset: The dataset the model will be trained on. mapped_train_dataset is a processed dataset, likely transformed into a format suitable for training, such as tokenized text.
compute_metrics: A function that computes metrics after an evaluation. This function is typically defined earlier in your code. It calculates metrics (like accuracy, BLEU score, etc.) on the evaluation dataset.
args: This specifies various training-related configurations using the TrainingArguments class:
per_device_train_batch_size: The batch size for training on each device (e.g., each GPU). Here it's set to 4.
gradient_accumulation_steps: Number of forward passes (batches) the model will see before an update (backpropagation) is performed. Here, the model will see 4 batches before an update.
warmup_steps: The number of steps for the learning rate warmup. The learning rate will gradually increase over these many steps at the beginning of training.
max_steps: Maximum number of training steps. Training will stop after 100 steps irrespective of epochs.
learning_rate: Specifies the learning rate for the optimizer. It's set to 0.001.
fp16: Indicates the use of 16-bit (also known as half-precision) floating point numbers during training. Using fp16 can accelerate training.
logging_steps: Interval at which logging will occur. Here, logs will be generated at every step.
output_dir: Directory where training-related outputs (like model checkpoints) will be saved. Here, they will be saved in a folder named 'outputs'.
data_collator: This is responsible for preparing and collating data samples into batched tensors before feeding them into the model. transformers.DataCollatorForLanguageModeling is used here, suited for causal language modeling tasks.
The argument mlm=False indicates that masked language modeling is not used (which makes sense since it's for causal language modeling).
model.config.use_cache = False: This disables the caching mechanism within the transformer model. In transformers, caching can store certain intermediate outputs to speed up the sequential processing of tokens. However, it might be disabled to save memory, especially if the sequences being processed are long.
Training the model

trainer.train()

trainer.train(): When this method is called, the model begins training on the dataset specified during the initialization of the Trainer object. The training process will use all the configurations, hyperparameters, and specifications you provided when you created the Trainer instance. It goes through the data in mini-batches as specified by the batch size. For each batch, it feeds the data through the model, computes the loss (difference between the model's predictions and the actual values), and then updates the model's weights using backpropagation. This process is repeated for the number of epochs or steps specified. An epoch is a complete pass through the entire training dataset.
Saving the finetuned model

model.save_pretrained("./my_model")
tokenizer.save_pretrained("./my_model")

Loading the finetuned model

loaded_model = AutoModelForCausalLM.from_pretrained("./my_model")
loaded_tokenizer = AutoTokenizer.from_pretrained("./my_model")

Inference function

from IPython.display import display, Markdown

def generate_completion(model, tokenizer, prompt_text, max_length=100):

    model.config.use_cache = True
    model.eval()
    input_ids = tokenizer.encode(prompt_text, return_tensors="pt")
    with torch.no_grad():
        output = model.generate(input_ids, max_length=max_length, num_return_sequences=1,
                                pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.eos_token_id,
                                temperature=0.1,
                                top_k=10,
                                top_p=0.1,
                                do_sample=True
                                )
    display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))

The function is defined with the following parameters:

model: The pre-trained model that you want to use for generating the text.
tokenizer: The tokenizer associated with the model that is responsible for converting the text into tokens (and vice-versa). prompt_text: The initial text (or prompt) you want to expand upon. max_length: Maximum length of the generated output. The default is set to 100 tokens.

Setting Model for Generation:

model.config.use_cache = True
model.eval()

The use_cache is set to True to allow the model to use past computations for faster generations. model.eval() sets the model to evaluation mode. This is essential as certain layers like dropout behave differently during training and evaluation.

Tokenization of the Prompt:

input_ids = tokenizer.encode(prompt_text, return_tensors="pt")

The prompt text is tokenized into a format the model understands (input_ids). The return_tensors="pt" ensures the result is a PyTorch tensor.

Generating the Completion:

with torch.no_grad():
    output = model.generate()

with torch.no_grad() ensures that no gradients are computed during this operation, saving memory and computational power. model.generate() is the method that produces the generated completion. Here's a brief on the parameters: input_ids: The to===kenized version of the prompt_text. max_length: The maximum length of the generated text. num_return_sequences: The number of sequences to return. It's set to 1, so only one completion is generated. pad_token_id, eos_token_id: The padding and end-of-sentence token IDs. This ensures the generated text is appropriately formatted. temperature: This controls the randomness of the output. Lower values make the output more deterministic. top_k, top_p: Parameters that control the randomness of the model's output by selecting the next token only from the top k tokens or top p probability. do_sample: It enables sampling, which means the model will consider multiple possible next tokens rather than just the most probable one.

Displaying the Generated Text:

display(Markdown(tokenizer.decode(output[0], skip_special_tokens=True)))

This decodes the generated token IDs back to human-readable text and then displays it in the Jupyter Notebook in a formatted manner.

Testing

prompt_text = "### INSTRUCTION\nWrite a function to find the area of a triangle:\n\n### OUTPUT\n"

prompt_text1 = "### INSTRUCTION\nWrite a function to find if a number is odd:\n\n### OUTPUT\n"

prompt_text2 = "### INSTRUCTION\nWrite a code to find factorial of a number:\n\n### OUTPUT\n"

print(generate_completion(loaded_model, loaded_tokenizer, prompt_text))

Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 50256 }

INSTRUCTION

Write a function to find the area of a triangle:

OUTPUT

Here is a Python function that calculates the area of a triangle:

def area_of_triangle(a, b, c):
    return (a * b) / 2

print(area_of_triangle(3, 4, 5))

print(generate_completion(loaded_model, loaded_tokenizer, prompt_text1))

INSTRUCTION

Write a function to find if a number is odd:

OUTPUT

Here is a Python function that takes a number as input and returns True if the number is odd, and False otherwise.

def is_odd(n):
    return n % 2 == 1

print(is_odd(5))
This function takes a number as an input and returns True if the number is odd,

print(generate_completion(loaded_model, loaded_tokenizer, prompt_text2))

INSTRUCTION

Write a code to find factorial of a number:

OUTPUT

Here is a Python code that solves the problem:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

print(factorial(5))

Conclusion

To wrap up, we've delved deep into the intricate process of optimizing language models with the power of Parameter-Efficient Fine-Tuning, namely through PEFT and LoRA. This advanced approach, heightened by its adeptness in efficient training, not only fine-tunes models with less computational expense but also ensures robust performance on specialized tasks like code generation. The hands-on implementation of the Codegen-based pre-trained model illustrates the practicality and potential of such methods. For those enthusiasts and professionals aiming to harness the prowess of large language models without the associated computational overhead, this exploration shines a light on the path forward.

If you want to contribute or you find any errors in this article please do leave me a comment.

You can reach out to me on any of the matrix decentralized servers. My element messenger ID is @maximilien:matrix.org

If you are in one of the mastodon decentralized servers, here is my ID @maximilien@qoto.org

If you are on linkedIn, you can reach me here

If you want to contact me via email maximilien@maxtekai.tech

If you want to hire me to work on machine learning, data science, IoT and AI-related projects, please reach out to me here

Warm regards,

Maximilien.

Parameter Efficient FineTuning In Action: Finetuning LLMs Using PEFT & LoRA For Causal Language Modeling Task

Did you find this article valuable?