SimCSE运行
运行结果
-
先安装好torch和所需库
-
先下载评估数据,并且要安装wegt以便运行bash语句
-
cd SentEval/data/downstream/ bash download_dataset.sh -
经向作者发邮件咨询如何运行以及参数调试
-
python evaluation.py --model_name_or_path princeton-nlp/sup-simcse-bert-base-uncased -
运行语句设置
-
python evaluation.py \ --model_name_or_path princeton-nlp/sup-simcse-bert-base-uncased \ --pooler cls \ --task_set sts \ --mode test
Evaluation
Arguments for the evaluation script are as follows,
--model_name_or_path: The name or path of atransformers-based pre-trained checkpoint. You can directly use the models in the above table, e.g.,princeton-nlp/sup-simcse-bert-base-uncased.--pooler: Pooling method. Now we supportcls(default): Use the representation of[CLS]token. A linear+activation layer is applied after the representation (it’s in the standard BERT implementation). If you use SimCSE, you should use this option.cls_before_pooler: Use the representation of[CLS]token without the extra linear+activation.avg: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa (paper), you should use this option.avg_top2: Average embeddings of the last two layers.avg_first_last: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
--mode: Evaluation modetest(default): The default test mode. To faithfully reproduce our results, you should use this option.dev: Report the development set results. Note that in STS tasks, onlySTS-BandSICK-Rhave development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than thetestmode (though numbers are slightly lower).fasttest: It is the same astest, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
--task_set: What set of tasks to evaluate on (if set, it will override--tasks)sts(default): Evaluate on STS tasks, includingSTS 12~16,STS-BandSICK-R. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.transfer: Evaluate on transfer tasks.full: Evaluate on both STS and transfer tasks.na: Manually set tasks by--tasks.
--tasks: Specify which dataset(s) to evaluate on. Will be overridden if--task_setis notna. See the code for a full list of tasks.
温馨提示: 遵纪守法, 友善评论!