To set custom kernel for notebook https://scicomp.aalto.fi/triton/apps/jupyter/#installing-kernels-from-virtualenvs-or-anaconda-environments

In [1]:
import warnings
warnings.filterwarnings('ignore')

import os
import sys
import time

## Set environment variables, this must be done before importing transformers
os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['HF_HOME']='/scratch/shareddata/dldata/huggingface-hub-cache'

In [2]:
print(os.environ['TRANSFORMERS_OFFLINE'])

1


# Why Hugginface?
- Open source.
- A vast repository of pre-trained models across various domains.
- Compitable with Tensorflow, Pytorch and JAX.
- A community, not just a toolkit.
- Researching and engineering.
- Fine-tuning capabilities.

https://huggingface.co/

# Simplest  approach to use huggingface/transformers for inference: pipeline class


In [6]:
from transformers import pipeline

pipeline = pipeline("sentiment-analysis")


# Prepare input text
inputs = ["What a lovely day today!","It is freezing outside."]

results = pipeline(inputs)

print("Results:", results)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Results: [{'label': 'POSITIVE', 'score': 0.999874472618103}, {'label': 'NEGATIVE', 'score': 0.9937865734100342}]


In [7]:
from transformers import pipeline

pipeline = pipeline("text-generation")

# Prepare input text
input_text = "The capital of France is"

output = pipeline(input_text, max_length=50)
generated_text = output[0]['generated_text']
print("Generated text:", generated_text)

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: The capital of France is that of the great French historian and poet Maurice de Loewie. His books, the History of France, which are the result of the same years and during the latter years of his life, are regarded as the definitive work


# Decompose the pipeline 


**What happens in the pipeline?**

Tokenization => Model => Post Processing


In [8]:
# Print relevant tokenizer information
print("Tokenizer Name:", pipeline.tokenizer.name_or_path)
print("Vocabulary Size:", pipeline.tokenizer.vocab_size)
print("Max Model Input Sizes:", pipeline.tokenizer.model_max_length)
print("Special Tokens:", pipeline.tokenizer.special_tokens_map)

Tokenizer Name: gpt2
Vocabulary Size: 50257
Max Model Input Sizes: 1024
Special Tokens: {'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}


In [9]:
# checkout the model architecture
pipeline.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [10]:
# checkout the model config
pipeline.model.config

GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "do_sample": true,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "max_length": 50,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.36.0",
  "use_cache": true,
  "vocab_size": 50257
}

## Tokenization

Tokenizers prepares text data for processing by Transformer models. 

**Tokenizers' function**:

- Text Preprocessing: Splitting Text into Tokens

- Convert Tokens to IDs: Each token is mapped to a unique integer ID.
- Add Special Tokens: 
    - BERT models use [CLS] at the beginning of the input for classification tasks and [SEP] to separate different segments in the input. 
    - In model pre-training, certain words in the input are replaced with the [MASK] token. The model then learns to predict the original value of these masked tokens, which helps in learning context and word relationships.
    - When the tokenizer encounters a word that is not in its vocabulary, it replaces it with the [UNK] (unknown) token. This is a way to handle out-of-vocabulary words.
    - GPT models use [BOS] indicates the start, and [EOS] marks the end of a text sequence. 
- Handle Fixed Sequence Lengths: Transformer models require inputs of a fixed length. Tokenizers pad shorter inputs with [PAD] tokens and truncate longer ones to meet the model's length requirements.

- Attention Mask: The tokenizer generates an attention mask to differentiate real tokens from padding tokens ([PAD]) such that the model will pay attention only to the relevant parts of the input.

- Consistency Across Languages: For multilingual models, tokenizers ensure consistent tokenization across different languages, maintaining a balanced and shared vocabulary.



Three tokenizer types: Word-based, Subword-based, Character-based.



### Most state-of-the-art models use subword-based tokenizers:

- BERT (Bidirectional Encoder Representations from Transformers): Uses the WordPiece tokenizer.

- GPT-2 and GPT-3 (Generative Pre-trained Transformer): Utilize a variant of Byte Pair Encoding (BPE).

- T5 (Text-To-Text Transfer Transformer): Employs the SentencePiece tokenizer, which is versatile and can be used across different languages and scripts.

In [11]:
from transformers import BertTokenizer

# Load the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Example text
text = "Hello, how many GPUs do you need?"

# Tokenize the text
tokens = tokenizer.tokenize(text)
print(tokens)

# Convert tokens to token IDs
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print(token_ids)

['hello', ',', 'how', 'many', 'gp', '##us', 'do', 'you', 'need', '?']
[7592, 1010, 2129, 2116, 14246, 2271, 2079, 2017, 2342, 1029]


In [12]:
from transformers import GPT2Tokenizer

In [13]:
# Load the tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Example text
text = "Hello, how many GPUs do you need?"

# Tokenize the text
tokens = tokenizer.tokenize(text)
print(tokens)

# Convert tokens to token IDs
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print(token_ids)

['Hello', ',', 'Ġhow', 'Ġmany', 'ĠGPUs', 'Ġdo', 'Ġyou', 'Ġneed', '?']
[15496, 11, 703, 867, 32516, 466, 345, 761, 30]


In [14]:
from transformers import T5Tokenizer

In [15]:
# Initialize the tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-base')

# Example text
text = "Hello, how many GPUs do you need?"

# Tokenize the text
tokens = tokenizer.tokenize(text,add_special_tokens=True)
print(tokens)

# Convert tokens to token IDs
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print(token_ids)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


['▁Hello', ',', '▁how', '▁many', '▁GPU', 's', '▁do', '▁you', '▁need', '?']
[8774, 6, 149, 186, 23356, 7, 103, 25, 174, 58]


**NOTE: A pretrained model only performs properly when the input was tokenized under the same rules that its training data were tokenized.**

### Tokenizer Classes in Hugging Face:
- PreTrainedTokenizer: base class for all tokenizers. It provides common methods and attributes that are shared across various tokenizer types. It's not typically used directly for loading specific model tokenizers.
- Specifically designed tokenizer, for example: BertTokenizer for the BERT model. It inherits from PreTrainedTokenizer.


In [16]:
from transformers import PreTrainedTokenizer

#Directely call a PreTrainedTokenizer, this will throw errors.
tokenizer = PreTrainedTokenizer.from_pretrained('bert-base-uncased')
encoded_input = tokenizer("Hello, Hugging Face!")


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'BertTokenizer'. 
The class this function is called from is 'PreTrainedTokenizer'.


NotImplementedError: 

In [24]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased',padding=True,truncation=True,max_length=20)

# Example text
text = "The capital of Finland is?"

# Tokenize the text
tokens = tokenizer.tokenize(text)
print(tokens)

['the', 'capital', 'of', 'finland', 'is', '?']


Hyperparameters in tokenizer:

- padding: padding Strategy
- truncate: truncation Strategy
- max_length: 
- ...

In [25]:
tokens = tokenizer.tokenize(text, padding=True,truncation=True,max_length=20)
tokens

Keyword arguments {'padding': True, 'truncation': True, 'max_length': 20} not recognized.


['the', 'capital', 'of', 'finland', 'is', '?']

**NOTE:** Call a tokenizer directly is used when you're preparing data for model input (like training or inference). Whereas the tokenize() method is used when you need a token-level analysis or manipulation of the text.

Hyperparameters like `padding`, `truncate`, `max_length`` are not recognized by tokenize() method.


In [20]:
text = ["Hello, Hugging Face! Tell me about all your tokenizer types.", "Hello, world!"]

# call a tokenizer directly, invoking its __call__ method
encoded_input = tokenizer(text, padding=True,truncation=True,max_length=20) 
for item in encoded_input.items():
    print(item)

('input_ids', [[101, 7592, 1010, 17662, 2227, 999, 2425, 2033, 2055, 2035, 2115, 19204, 17629, 4127, 1012, 102], [101, 7592, 1010, 2088, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
('token_type_ids', [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
('attention_mask', [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])


## Model
### Huggingface Model Classes:
https://huggingface.co/docs/transformers/model_doc/auto
- **Base model**:

Base model is also referred to as a pre-trained model, is a model that has already been trained on a large, generic dataset. The primary purpose of a base model is to capture a wide range of language features and understandings, such as grammar, context, and basic associations. A base model provides a robust foundation of language understanding which can be adapted for specific tasks.

Base models in Huggingface are often named after the architecture they use, like bert-base-uncased, gpt2-medium,t5-base, etc.
- **Fine tuned model:**

A fine-tuned model is a model that has undergone additional training (fine-tuning) on a smaller, task-specific dataset. This can include tasks like sentiment analysis, question answering, or domain-specific language understanding.

Fine-tuned models usually have additional descriptors in their names indicating the specific task or dataset they are fine-tuned for. For instance, **"bert-base-uncased-finetuned-squad"** is a BERT model fine-tuned on the SQuAD dataset for question answering, whereas **"bert-base-uncased"** is a base model.

More information can usually be found in the README or model description in the model repo.
Besides, inspecting the Model's Configuration or architecture can also give hints.

### Which model to use?
[https://huggingface.co/models](https://huggingface.co/models)
* Task Type
* Specific language (especially non-English languages)
* Model Size and Performance
* Fine-Tuning and Customization
* Community and Support
* Documentation and Examples
* Ethical Considerations
* Licensing and Cost


## Set up the tokenizer, load the model and perform inference, step by step.

In [28]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Initialize the tokenizer for GPT-2
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Load the pre-trained GPT-2 model
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Prepare input text
input_text = "The capital of France is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate attention mask
attention_mask = tokenizer(input_text, return_tensors="pt").attention_mask

# Set pad token ID if it's not already set
model.config.pad_token_id = model.config.eos_token_id

# Generate output
outputs = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Generated text:", generated_text)

Generated text: The capital of France is the capital of the French Republic, and the capital of the French Republic is the capital of the French Republic.

The French Republic is the capital of the French Republic.

The French Republic is the capital of the


In [27]:
attention_mask

tensor([[1, 1, 1, 1, 1]])

**Do I need to look for the specific tokenizer and model classes for my tasks every time?**

In many cases, no. The architecture you want to use can be guessed from the name or the path of the pretrained model. Huggingface provides **AutoClasses** to help you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.


In [29]:
## NOTE: AutoModel will instantiate a base model class without a specific head, so we still need 
## a "relatively specific" class AutoModelForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM

# Initialize the tokenizer for GPT-2
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Load the pre-trained GPT-2 model
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Prepare input text
input_text = "The capital of France is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate attention mask
attention_mask = tokenizer(input_text, return_tensors="pt").attention_mask

# Set pad token ID if it's not already set
model.config.pad_token_id = model.config.eos_token_id

# Generate output
outputs = model.generate(input_ids, max_length=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Generated text:", generated_text)

Generated text: The capital of France is the capital of the French Republic, and the capital of the French Republic is the capital of the French Republic.

The French Republic is the capital of the French Republic.

The French Republic is the capital of the


# Key outputs from a language model
- Logits: The raw, unnormalized scores for each vocabulary token at each position in the output sequence. By default, the model's forward pass returns the logits.
- Hidden States: Representations from each layer of the model. These are the activations of the model's neurons at each layer. Set `output_hidden_states=True` in the configuration or when calling the model to obtain Hidden States.
- Attentions: Attention weights from each layer of the model. These weights show how much each token in a sequence attends to every other token at each layer. Set `output_attentions=True` in the configuration or when calling the model to obtain Attentions.

In [31]:
model

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(28996, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
  

In [30]:
from transformers import AutoTokenizer, AutoModel

model = AutoModel.from_pretrained("bert-base-cased")

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

# Prepare input text
input_text = "The capital of France is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# get hidden state
outputs = model(input_ids)
print(outputs.last_hidden_state)

tensor([[[ 0.2629,  0.0496,  0.1699,  ..., -0.0339,  0.2812, -0.0489],
         [-0.3381, -0.2910,  0.2394,  ...,  0.4664, -0.4263,  0.2448],
         [-0.3315, -0.1127, -0.1425,  ...,  0.6752, -0.1898,  0.5174],
         ...,
         [-0.1510,  0.4374, -0.2816,  ...,  0.3068,  0.4450,  0.4092],
         [ 0.0758,  0.1059,  0.0871,  ...,  0.3782,  0.2463, -0.2250],
         [-0.0174, -0.1541, -1.0330,  ...,  0.4842,  0.6491,  0.2534]]],
       grad_fn=<NativeLayerNormBackward0>)


In [32]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Initialize the tokenizer for GPT-2
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Load the pre-trained GPT-2 model
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Prepare input text
input_text = "The capital of France is"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate attention mask
attention_mask = tokenizer(input_text, return_tensors="pt").attention_mask

# Set pad token ID if it's not already set
model.config.pad_token_id = model.config.eos_token_id

# Generate output
outputs = model(input_ids, output_hidden_states=True, output_attentions=True)

print("logits:",outputs.logits)
print("Attentions:",outputs.attentions)

logits: tensor([[[ -36.2874,  -35.0114,  -38.0793,  ...,  -40.5164,  -41.3760,
           -34.9193],
         [ -75.1021,  -75.6483,  -82.6827,  ...,  -82.5961,  -79.3913,
           -76.2687],
         [ -80.0968,  -78.6868,  -81.2341,  ...,  -83.7548,  -85.6541,
           -79.8042],
         [ -86.0085,  -86.4618,  -91.0184,  ...,  -98.6912,  -93.3734,
           -87.9286],
         [-108.9542, -108.9327, -112.5793,  ..., -118.3345, -113.1505,
          -110.3779]]], grad_fn=<UnsafeViewBackward0>)
Attentions: (tensor([[[[1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
          [8.4640e-01, 1.5360e-01, 0.0000e+00, 0.0000e+00, 0.0000e+00],
          [7.0135e-01, 2.2373e-01, 7.4919e-02, 0.0000e+00, 0.0000e+00],
          [6.0768e-01, 1.7884e-01, 1.4391e-01, 6.9565e-02, 0.0000e+00],
          [6.0990e-01, 1.5188e-01, 6.2560e-02, 9.4493e-02, 8.1164e-02]],

         [[1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
          [1.4739e-04, 9.9985e-01, 0.0000e+00, 

In [38]:
outputs.logits.shape

torch.Size([1, 5, 50257])

# Configurations

## Model configuration
Hyperparameters to change a model's architecture. 


In [39]:
from transformers import GPT2Model,GPT2Config

# Default configuration
model = GPT2Model.from_pretrained("gpt2")
model

GPT2Model(
  (wte): Embedding(50257, 768)
  (wpe): Embedding(1024, 768)
  (drop): Dropout(p=0.1, inplace=False)
  (h): ModuleList(
    (0-11): 12 x GPT2Block(
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (attn): GPT2Attention(
        (c_attn): Conv1D()
        (c_proj): Conv1D()
        (attn_dropout): Dropout(p=0.1, inplace=False)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): GPT2MLP(
        (c_fc): Conv1D()
        (c_proj): Conv1D()
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)

In [40]:
model.config

GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.36.0",
  "use_cache": true,
  "vocab_size": 50257
}

In [41]:
# Create a custom configuration
config = GPT2Config(
    n_layer=6,
    n_head=8
)
# Load model with custom configuration
model = GPT2Model.from_pretrained("gpt2", config=config)
model

GPT2Model(
  (wte): Embedding(50257, 768)
  (wpe): Embedding(1024, 768)
  (drop): Dropout(p=0.1, inplace=False)
  (h): ModuleList(
    (0-5): 6 x GPT2Block(
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (attn): GPT2Attention(
        (c_attn): Conv1D()
        (c_proj): Conv1D()
        (attn_dropout): Dropout(p=0.1, inplace=False)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): GPT2MLP(
        (c_fc): Conv1D()
        (c_proj): Conv1D()
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)

## Generating/Inference configuration

**Different decoding strategies**:

https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb

**Generation parameters**: 

https://huggingface.co/docs/transformers/v4.35.2/en/main_classes/text_generation#transformers.GenerationConfig


In [55]:
from transformers import pipeline
import torch
model = "gpt2"

pipeline = pipeline(
    "text-generation",
    model=model,
    trust_remote_code=True,
    torch_dtype=torch.float32
)

sequences = pipeline(
    'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n',
    do_sample=True,
    top_k=20,
    pad_token_id=tokenizer.eos_token_id,
    temperature=1.0,
    max_length=50,
    num_return_sequences=3
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}\n")

Result: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

Brock (Nrama): What's your favorite show?

Hollywood Reporter: The only show that really made

Result: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

I've had the urge to watch a lot of shows and I have a few. I've seen "Pulp Fiction

Result: I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?

I haven't even tried any other shows yet. I haven't tried it for a while. I didn't see a



# Exercises

**Exercise 1: Exploring Pre-trained Models**

Objective: Familiarize with the Hugging Face Model Hub.

Task: Browse the Hugging Face Model Hub and find a pre-trained model suitable for sentiment analysis. Write a short script to explore the model's architecture, configration, output, etc. 

**Exercise 2: Text Generation**

Objective: Understand the capabilities of text generation models.

Task: Use a text generation model to generate a short text based on a given prompt. Experiment with different temperature settings and observe how it affects the creativity of the output.

# Candidate topics for next session(TBD):
- How to load model architecture with random weights instead of trained weights
- Fine tuning workflow
- Huggingface Dataset