python - 使用 tokenizer.encode_plus 时遇到问题

#jupyter 笔记本

我正在尝试使用 https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX#scrollTo=2bBdb3pt8LuQ 研究 BERT 分类器

在那个 colab 中，从“标记所有句子......”开始

在那部分，我遇到了麻烦“TypeError:_tokenize() 得到了一个意外的关键字参数‘pad_to_max_length’”

**
input_ids = []
attention_masks = []

for sent in sentences:
    encoded_dict = tokenizer.encode_plus(
                    sent,                      # Sentence to encode.
                    add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                    max_length = 64,           # Pad & truncate all sentences.
                    pad_to_max_length = True,
                    return_attention_mask = True,   # Construct attn. masks.
                    return_tensors = 'pt',     # Return pytorch tensors.
               )

最佳答案

引用:this post

“问题在于 conda 仅在版本 2.1.1(存储库信息)中提供了转换器库，而该版本没有 pad_to_max_length 参数。”

所以也许最好的选择是卸载然后重新安装转换器(这次使用 pip install 而不是 conda forge)或者创建一个新的 conda 环境并安装所有内容(通过 pip 而不是通过 conda)。

https://stackoverflow.com/questions/63884856/

angular - 如何避免@types 包引入的补丁级别增加带来重大变化的问题？

python-3.x - 如何创建 swagger :response that produces

flutter - 如何在 Flutter 中绘制尖三角形边？

javascript - 如何在 react 中的不同组件库之间共享上下文？

javascript - 使用逗号自动格式化数字时如何保留预期的光标位置？

javascript - select2 on select data-select2-id 属性添

postgresql - Cloud SQL (postgres) 外部数据包装器连接超时到副本实例

python - 为什么 Keras 不返回 lstm 层中细胞状态的完整序列？

gradle - React Native 项目 Android Gradle Fail (Reac