Standard ROUGE metrics Rouge-1, Rouge-2 and Rouge-4 F-scores will be used for evaluation.
Note: While we will use a max length of 75 words for evaluation, participants are expected to predict an appropriate summary length for each article. Too long or short lengths compared to the groud truth summary can
adverseley effect ROUGE precision or recall respectively.