Update README.md

2026-05-14 20:58:08 +00:00 · 2020-10-21 12:27:09 +02:00 · 2020-10-21 12:27:09 +02:00 · 8ffd7fb12d
commit 8ffd7fb12d
parent 613ab364eb
1 changed files with 10 additions and 1 deletions
--- a/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md
+++ b/model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md
@ -1,3 +1,9 @@
+---
+language: en
+datasets:
+- cnn_dailymail
+---
+
 ## prophetnet-large-uncased-cnndm
 Fine-tuned weights(converted from [original fairseq version repo](https://github.com/microsoft/ProphetNet)) for [ProphetNet](https://arxiv.org/abs/2001.04063) on summarization task CNN/DailyMail.  
 ProphetNet is a new pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.  
@ -15,8 +21,11 @@ inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=100, return_tensors='pt')

 # Generate Summary
 summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True)
-tokenizer.batch_decode(summary_ids.tolist())
+tokenizer.batch_decode(summary_ids, skip_special_tokens=True)
+
+# should give: 'ustc was founded in beijing by the chinese academy of sciences in 1958. [X_SEP] ustc\'s mission was to develop a high - level science and technology workforce. [X_SEP] the establishment was hailed as " a major event in the history of chinese education and science "'
 ```
+
 Here, [X_SEP] is used as a special token to seperate sentences.
 ### Citation
 ```bibtex