

The verbalizer is one of the most important module in prompt-learning, which projects the original labels to a set of label words.

We implement common verbalizer classes in OpenPrompt.

One to One Verbalizer

The basic one to one Verbalizer.

class One2oneVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, num_classes: Optional[int] = None, classes: Optional[List] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)

The basic manually defined verbalizer class, this class is inherited from the Verbalizer class. This class restrict the use of label words to one words per label. For a verbalzer with less constraints, please use Basic ManualVerbalizer.

  • tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.

  • classes (classes) – The classes (or labels) of the current task.

  • num_classes (int) – Optional. The number of classes of the verbalizer. Only one of classes and num_classes should be used.

  • label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer. (used in PLMs like RoBERTa, which is sensitive to prefix space)

  • multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.

  • post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

static add_prefix(label_words, prefix)

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

  • label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer.


New label words with prefix.

Return type


calibrate(label_words_probs: torch.Tensor, **kwargs) torch.Tensor

label_words_probs (torch.Tensor) – The probability distribution of the label words with the shape of [batch_size, num_classes, num_label_words_per_class]


The calibrated probability of label words.

Return type


generate_parameters() List

In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

normalize(logits: torch.Tensor) torch.Tensor

Given logits regarding the entire vocabulary, return the probs over the label words set.


logits (Tensor) – The logits over the entire vocabulary.


The logits over the label words set.

Return type



A hook to do something when textual label words were set.

process_logits(logits: torch.Tensor, **kwargs)

A whole framework to process the original logits over the vocabulary, which contains four steps:

  1. Project the logits into logits of label words

if self.post_log_softmax is True:

  1. Normalize over all label words

  2. Calibrate (optional)


logits (torch.Tensor) – The orginal logits.


The final processed logits over the label words set.

Return type


project(logits: torch.Tensor, **kwargs) torch.Tensor

Project the labels, the return value is the normalized (sum to 1) probs of label words.


logits (torch.Tensor) – The orginal logits of label words.


The normalized logits of label words

Return type


Manual Verbalizer

The basic manually defined Verbalizer.

class ManualVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)

The basic manually defined verbalizer class, this class is inherited from the Verbalizer class.

  • tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.

  • classes (List[Any]) – The classes (or labels) of the current task.

  • label_words (Union[List[str], List[List[str]], Dict[List[str]]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)

  • multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.

  • post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

static add_prefix(label_words, prefix)

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

  • label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer.


New label words with prefix.

Return type


aggregate(label_words_logits: torch.Tensor) torch.Tensor

Use weight to aggregate the logits of label words.


label_words_logits (torch.Tensor) – The logits of the label words.


The aggregated logits from the label words.

Return type


calibrate(label_words_probs: torch.Tensor, **kwargs) torch.Tensor

label_words_probs (torch.Tensor) – The probability distribution of the label words with the shape of [batch_size, num_classes, num_label_words_per_class]


The calibrated probability of label words.

Return type


generate_parameters() List

In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

normalize(logits: torch.Tensor) torch.Tensor

Given logits regarding the entire vocabulary, return the probs over the label words set.


logits (Tensor) – The logits over the entire vocabulary.


The logits over the label words set.

Return type



A hook to do something when textual label words were set.

process_logits(logits: torch.Tensor, **kwargs)

A whole framework to process the original logits over the vocabulary, which contains four steps:

  1. Project the logits into logits of label words

if self.post_log_softmax is True:

  1. Normalize over all label words

  2. Calibrate (optional)

  1. Aggregate (for multiple label words)


logits (torch.Tensor) – The orginal logits.


The final processed logits over the labels (classes).

Return type


project(logits: torch.Tensor, **kwargs) torch.Tensor

Project the labels, the return value is the normalized (sum to 1) probs of label words.


logits (torch.Tensor) – The orginal logits of label words.


The normalized logits of label words

Return type


Automatic Verbalizer

The Automatic Verbalizer defined in Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification.

class AutomaticVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, num_candidates: Optional[int] = 1000, label_word_num_per_class: Optional[int] = 1, num_searches: Optional[int] = 1, score_fct: Optional[str] = 'llr', balance: Optional[bool] = True, num_classes: Optional[bool] = None, classes: Optional[List[str]] = None, init_using_split: Optional[str] = 'train', **kwargs)

This implementation is slightly different from the original code in that 1). we allow re-selecting the verbalizer after a fixed training steps. The original implementation only performs one step selection after getting the inital logits on the training data. To adopt their implementation, please only do optimize() after the first pass of training data.

2). We strictly follows the probility calculation in Equation (3) in the paper, which take softmax over the logits.

3). We do not implements the ``combine_patterns’’ if-branch. Since it’s not a pure verbalizer type, and doesn’t yield much improvement. However, it can be achieve by using EnsembleTrainer to pass text wrapped by multiple templates together with this verbalizer.

We use a probs_buffer to store the probability \(q_{P,t}(1|\mathbf{x})\) that to be used in later verbalizer selection, and a label_buffer to store the label \(y\) that to be used in later verbalizer selection.

  • num_candidates (int, optional) – the number of candidates for further selection based on Section 4.1

  • label_word_num_per_class (int, optional) – set to be greater than 1 to support Multi-Verbalizers in Section 4.2

  • num_searches (int, optional) – Maximnum number of label_words search. After reaching this number, the verbalizer will use the same label_words as the previous iterations.

  • search_id (int, optional) – the id of current search, used to determine when to stop label words searching.

  • score_fct (str, optional) – the scoring function of label words selection. llr means log likelihood ratio, corresponding to Equation (7); ce means cross entropy, corresponding to Equation (6). As the paper points out, ``llr’’ is significantly better than ‘ce’, we only keep it to match the original code.

  • balance (book, optional) – whether to perform normalization of unbalanced training dataset, as Equation (5).

from_file(path: str, choice: Optional[int] = 0)

Load the predefined label words from verbalizer file. Currently support three types of file format: 1. a .jsonl or .json file, in which is a single verbalizer in dict format. 2. a .jsonal or .json file, in which is a list of verbalizers in dict format 3. a .txt or a .csv file, in which is the label words of a class are listed in line, seperated by commas. Begin a new verbalizer by an empty line. This format is recommended when you don’t know the name of each class.

The details of verbalizer format can be seen in How to Write a Verbalizer?.

  • path (str) – The path of the local template file.

  • choice (int) – The choice of verbalizer in a file containing multiple verbalizers.


self object

Return type



This is an epoch-level optimize. If used in batch-level like an ordinary gradient descend optimizer, the result may not be very satisfying since the accumated examples (i.e., the probs_buffer and the labels_buffer) are not enough if the batchsize is small.

project(logits: torch.Tensor, **kwargs) torch.Tensor

When this verbalizer hasn’t perform optimize(), it has no label_words_ids, thus will give random predictions, and should have no connection to the model to give (miss-leading) grads.


logits (torch.Tensor) – The original logits over the vocabulary.


The projected logits of label words.

Return type


register_buffer(logits, labels)
  • logits (torch.Tensor) –

  • labels (List) –

Knowledgeable Verbalizer

The Knowledgeable Verbalizer defined in Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification.

class KnowledgeableVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, classes: Optional[Sequence[str]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', max_token_split: Optional[int] = - 1, verbalizer_lr: Optional[float] = 0.05, candidate_frac: Optional[float] = 0.5, **kwargs)

This is the implementation of knowledeagble verbalizer, which uses external knowledge to expand the set of label words. This class inherit the ManualVerbalizer class.

  • tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.

  • classes (classes) – The classes (or labels) of the current task.

  • prefix (str, optional) – The prefix string of the verbalizer.

  • multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.

  • max_token_split (int, optional) –

  • verbalizer_lr (float, optional) – The learning rate of the verbalizer optimization.

  • candidate_frac (float, optional) –

static add_prefix(label_words, prefix)

add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ‘ ‘.

aggregate(label_words_logits: torch.Tensor) torch.Tensor

Use weight to aggregate the logots of label words.


label_words_logits (torch.Tensor) – The logits of the label words.


The aggregated logits from the label words.

Return type


from_file(path: str, separator: Optional[str] = ',')

Load the predefined label words from verbalizer file

generate_parameters() List

In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more one token.


A hook to do something when textual label words were set.

project(logits: torch.Tensor, **kwargs) torch.Tensor

The return value if the normalized (sum to 1) probs of label words.

register_calibrate_logits(logits: torch.Tensor)

For Knowledgeable Verbalizer, it’s nessessory to filter the words with has low prior probability. Therefore we re-compute the label words after register calibration logits.

PTR Verbalizer

The verbalizer of PTR from PTR: Prompt Tuning with Rules for Text Classification.

class PTRVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[Sequence[str]] = None, num_classes: Optional[int] = None, label_words: Optional[Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]]] = None)

In PTR, each prompt has more than one <mask> tokens. Different <mask> tokens have different label words. The final label is predicted jointly by these label words using logic rules.

  • tokenizer (PreTrainedTokenizer) – A tokenizer to appoint the vocabulary and the tokenization strategy.

  • classes (Sequence[str]) – A sequence of classes that need to be projected.

  • label_words (Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]], optional) – The label words that are projected by the labels.


Prepare One2oneVerbalizer for each <mask> seperately

process_logits(logits: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)
  1. Process vocab logits of each <mask> into label logits of each <mask>

  2. Combine these logits into a single label logits of the whole task


logits (torch.Tensor) – vocab logits of each <mask> (shape: [batch_size, num_masks, vocab_size])


logits (label logits of whole task (shape: [batch_size, label_size of the whole task]))

Return type


Generation Verbalizer

This verbalizer empower the “generation for all the tasks” paradigm.

class GenerationVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, is_rule: Optional[bool] = False)

This verbalizer is usefull when the label prediction is better defined by a piece of input. For example, in correference resolution, the tgt_text is a proper noun metioned in the text. This is there is no fixed mapping between a class label and its label words. This verbalizer is can be used as verbalizer of COPA and WSC dataset in superglue datasets.

This verbalizer is especially powerful when combined with All NLP Tasks Are Generation Tasks Paradigm (Also see Crossfit). It can make any piece of text the tgt_text. The tgt_text will then be filled in the {“mask”}.

For example, when label word is "good", the tgt_text is "good";

when label word is {"text":"good"}, the tgt_text is also "good";

when label word is {"meta":"choice1"}, the tgt_text is the "meta['choice1']" field of the InputExample;

when label word is {"meta":"choice1"} {"placeholder", "text_a"} ., the tgt_text is the "meta['choice1']" field of the InputExample, followed by text_a field of the InputExample, and then a '.';

A use case can be seen in Tutorial 4.1

  • tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.

  • classes (List[Any]) – The classes (or labels) of the current task.

  • prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)

  • is_rule (bool, optional) – When the verbalizer use the rule syntax of MixTemplate.


Process the text into the label words (sometimes a function) according to the syntax of MixedTemplate

wrap_one_example(example: openprompt.data_utils.utils.InputExample) List[Dict]

Take an InputExample, and fill the tgt_text with label words

Soft Verbalizer

class SoftVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer], plm: Optional[transformers.utils.dummy_pt_objects.PreTrainedModel], classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first')

The implementation of the verbalizer in WARP

  • tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.

  • classes (List[Any]) – The classes (or labels) of the current task.

  • label_words (Union[List[str], List[List[str]], Dict[List[str]]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)

  • multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.

  • post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

static add_prefix(label_words, prefix)

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

  • label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.

  • prefix (str, optional) – The prefix string of the verbalizer.


New label words with prefix.

Return type


gather_outputs(outputs: transformers.file_utils.ModelOutput)

retrieve useful output for the verbalizer from the whole model ouput By default, it will only retrieve the logits


outputs (ModelOutput) –


torch.Tensor The gathered output, should be of shape (batch_size, seq_len, any)

generate_parameters() List

In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

property group_parameters_1

Include the parameters of head’s layer but not the last layer In soft verbalizer, note that some heads may contain modules other than the final projection layer. The parameters of these part should be optimized (or freezed) together with the plm.

property group_parameters_2

Include the last layer’s parameters


A hook to do something when textual label words were set.

process_hiddens(hiddens: torch.Tensor, **kwargs)

A whole framework to process the original logits over the vocabulary, which contains four steps:

process_outputs(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)

By default, the verbalizer will process the logits of the PLM’s output.

  • logits (torch.Tensor) – The current logits generated by pre-trained language models.

  • batch (Union[Dict, InputFeatures]) – The input features of the data.