We manually collected a total of 100 samples in 10 categories from the public news websites Netease News and Tencent News for gaze collection. And Each sample includes an article and a title, where the title is used as a reference when the user writes the summary.
Length distribution of articles in different categories
Articles used in the study were collected manually from public news websites. There are 100 articles in total which belong to 10 popular categories. Each category has 10 articles. When selecting samples, we deliberately avoided samples that can summarize the entire article content in the first sentence. In addition, samples that are too short or too long were not selected. The average length of all articles is around 502 Chinese characters, and that of all titles is around 22 Chinese characters. The longest article has 842 Chinese characters, and the shortest article has 99 Chinese characters.
All users' familiarity score distribution of articles in different categories
The familiarity score ranges from 1 to 5, where 1 means very unfamiliar, and 5 means very familiar.
The average gaze time on each Chinese character of each participant
The summary similarity distributions in different categories
It is not difficult to see that there are large differences between the summaries, and many of the similarities are lower than the empirical value of 0.8. The distributions in different categories are also different, among which summaries in the cultural category have the lowest similarity.
In order to compare the similarities and differences in the gaze distribution of different people during reading, we show the collected gaze behavior in the form of a heat map. The brighter part of the heat map indicates that the participant has been reading the current area for a longer time.By comparing these groups of heat maps in the figures, we can propose the following two assumptions:
- When reading and summarizing text, everyone has their own stable reading patterns and preferences.
- When reading and summarizing text, there are different reading patterns and preferences existing between different people.