From 8ffd7fb12db877cfa28f8709bc563b4346a560c5 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Wed, 21 Oct 2020 12:27:09 +0200 Subject: [PATCH] Update README.md --- .../prophetnet-large-uncased-cnndm/README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md b/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md index 094dbf402..085403067 100644 --- a/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md +++ b/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md @@ -1,3 +1,9 @@ +--- +language: en +datasets: +- cnn_dailymail +--- + ## prophetnet-large-uncased-cnndm Fine-tuned weights(converted from [original fairseq version repo](https://github.com/microsoft/ProphetNet)) for [ProphetNet](https://arxiv.org/abs/2001.04063) on summarization task CNN/DailyMail. ProphetNet is a new pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction. @@ -15,8 +21,11 @@ inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=100, return_tensors='pt') # Generate Summary summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True) -tokenizer.batch_decode(summary_ids.tolist()) +tokenizer.batch_decode(summary_ids, skip_special_tokens=True) + +# should give: 'ustc was founded in beijing by the chinese academy of sciences in 1958. [X_SEP] ustc\'s mission was to develop a high - level science and technology workforce. [X_SEP] the establishment was hailed as " a major event in the history of chinese education and science "' ``` + Here, [X_SEP] is used as a special token to seperate sentences. ### Citation ```bibtex