Vetenskapliga tidsfördrif by Gaston Tissandier : Difficulty Assessment for Swedish Learners

How difficult is Vetenskapliga tidsfördrif for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 58,614, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Vetenskapliga tidsfördrif to have a difficulty score of 73. Here're its scores:

Measure Score
easy difficult (1 - 100)
Overall Difficulty 73% 73
Vocabulary Difficulty 91% 91
Grammatical Difficulty 56% 56

Vocabulary Difficulty: Breakdown

91%

Vocabulary difficulty: 91%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Vetenskapliga tidsfördrif's text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Vetenskapliga tidsfördrif:

Vocabulary difficulty breakdown for Vetenskapliga tidsfördrif: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Vetenskapliga tidsfördrif:

Measure Score
Measure Score
Number of words 58,614
Number of unique words 10,505
Number of recognized words for names/places/other entities 1,126
Number of very rare non-entity words 6,668
Number of sentences 8,798
Average number of words/sentence 7

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 10,294 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Vetenskapliga tidsfördrif without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

56%

Grammatical difficulty: 56%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure Score
Measure Score
Automated Readability Index 6
Coleman-Liau Index 10
Type/Token Ratio (TTR) 0.179223
Root type/Token Ratio (RTTR) 0.00000305769
Corrected type/Token Ratio (CTTR) 0.00000152884
MTLD Index 63
HDD Index 66
Yule's I Index 73
Lexical Diversity Index (MTLD + HD-D + Yule's I) 67

The type-token ratio (TTR) of Vetenskapliga tidsfördrif is 0.179223. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 10,505, while the number of words is 58,614, so the TTR is 10,505 / 58,614 = 0.179223. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 10,505 / (58,614 * 58,614) = 0.00000305769), while in CTTR, it is divided by a square of the number of words, multiplied twice 10,505 / 2 * (58,614 * 58,614) = 0.00000152884). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 6, making it understandable for 6-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 67 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

  • the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
  • the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
  • the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 56.

Other Information about Vetenskapliga tidsfördrif by Gaston Tissandier

We provide you a sample of the text below, however, the full text of the Vetenskapliga tidsfördrif is also available free of charge on our website.

Sample of text:

Då man ånyo utsätter dessa ämnen för ljusets inverkan, visar sig samma företeelse. Styrkan af det efter insolationen utsända ljuset är alltid mycket mindre än hvad det är hos den källa, hvarifrån ljuset kommit. Det synes, som om dessa företeelser först blifvit iakttagna hos ädelstenar, sedermera år 1604 hos den brända stenen från Bologna,VETENSKAPLIGA TIDSFÖRDRIF. derefter hos en diamant af Boyle år 1663 samt år 1675 hos Baudins fosfor (återstoden efter glödgningen af salpetersyrad kalk) och slutligen i senare tider hos andra ämnen, som vi ...

Top most frequently used words in Vetenskapliga tidsfördrif by Gaston Tissandier*

Position Word Repetitions Part of all words
Position Word Repetitions Part of all words
1 en 1,602 2.73%
2 och 1,493 2.55%
3 af 1,272 2.17%
4 som 1,158 1.98%
5 att 1,130 1.93%
6 958 1.63%
7 man 904 1.54%
8 med 805 1.37%
9 den 753 1.28%
10 är 666 1.14%
11 ett 658 1.12%
12 det 590 1.01%
13 de 495 0.84%
14 för 470 0.8%
15 till 440 0.75%
16 sig 418 0.71%
17 kan 386 0.66%
18 338 0.58%
19 icke 261 0.45%
20 om 241 0.41%
21 genom 238 0.41%
22 vi 236 0.4%
23 eller 234 0.4%
24 Fig 225 0.38%
25 detta 219 0.37%
26 vid 218 0.37%
27 har 215 0.37%
28 195 0.33%
29 denna 174 0.3%
30 äro 173 0.3%
31 Sid 166 0.28%
32 under 159 0.27%
33 samma 157 0.27%
34 hvilken 147 0.25%
35 men 144 0.25%
36 kunna 137 0.23%
37 rörelse 134 0.23%
38 sätt 133 0.23%
39 hafva 126 0.21%
40 från 123 0.21%
41 utan 122 0.21%
42 mycket 117 0.2%
43 än 117 0.2%
44 såsom 115 0.2%
45 sin 114 0.19%
46 dem 113 0.19%
47 hvilka 112 0.19%
48 två 110 0.19%
49 skola 109 0.19%
50 några 107 0.18%
51 dessa 107 0.18%
52 VETENSKAPLIGA 106 0.18%
53 TIDSFÖRDRIF 105 0.18%
54 andra 101 0.17%
55 medelst 101 0.17%
56 nu 98 0.17%
57 han 95 0.16%
58 mot 92 0.16%
59 dess 92 0.16%
60 stor 92 0.16%
61 lätt 88 0.15%
62 alla 88 0.15%
63 vatten 84 0.14%
64 lika 82 0.14%
65 oss 82 0.14%
66 efter 82 0.14%
67 ser 79 0.13%
68 hvarandra 77 0.13%
69 skall 76 0.13%
70 endast 75 0.13%
71 hvilket 74 0.13%
72 skulle 74 0.13%
73 vara 73 0.12%
74 liten 73 0.12%
75 här 73 0.12%
76 omkring 72 0.12%
77 visar 72 0.12%
78 göra 72 0.12%
79 äfven 72 0.12%
80 experiment 71 0.12%
81 öfver 69 0.12%
82 något 67 0.11%
83 ur 67 0.11%
84 mellan 66 0.11%
85 olika 65 0.11%
86 hvars 62 0.11%
87 försök 61 0.1%
88 se 61 0.1%
89 hos 57 0.1%
90 huru 56 0.1%
91 någon 56 0.1%
92 föremål 56 0.1%
93 små 56 0.1%
94 åt 55 0.09%
95 vattnet 54 0.09%
96 apparaten 53 0.09%
97 der 53 0.09%
98 52 0.09%
99 mindre 52 0.09%
100 gång 51 0.09%
101 sålunda 51 0.09%
102 upp 50 0.09%
103 annan 50 0.09%
104 ganska 48 0.08%
105 alltid 48 0.08%
106 axel 47 0.08%
107 ställning 47 0.08%
108 jag 47 0.08%
109 mängd 47 0.08%
110 ofta 47 0.08%
111 behöfver 47 0.08%
112 riktning 47 0.08%
113 ned 46 0.08%
114 sådana 45 0.08%
115 öfre 45 0.08%
116 ex 44 0.08%
117 följd 44 0.08%
118 sedan 44 0.08%
119 papperet 44 0.08%
120 synes 44 0.08%
121 del 44 0.08%
122 sådan 43 0.07%
123 var 43 0.07%
124 handen 43 0.07%
125 hvarje 42 0.07%
126 gifva 42 0.07%
127 blifvit 42 0.07%
128 ena 41 0.07%
129 lägger 41 0.07%
130 stora 41 0.07%
131 större 41 0.07%
132 ännu 40 0.07%
133 finnes 40 0.07%
134 kommer 40 0.07%
135 visa 39 0.07%
136 form 39 0.07%
137 blott 38 0.06%
138 går 38 0.06%
139 luften 38 0.06%
140 slags 38 0.06%
141 tid 37 0.06%
142 åstadkomma 37 0.06%
143 färg 37 0.06%
144 hvita 37 0.06%
145 36 0.06%
146 grm 36 0.06%
147 våra 36 0.06%
148 fram 36 0.06%
149 instrument 36 0.06%
150 sitt 36 0.06%
151 stort 36 0.06%
152 hvad 36 0.06%
153 apparater 36 0.06%
154 båda 36 0.06%
155 vanliga 35 0.06%
156 vår 34 0.06%
157 deraf 34 0.06%
158 befinner 34 0.06%
159 väl 34 0.06%
160 litet 34 0.06%
161 sådant 34 0.06%
162 ut 34 0.06%
163 lilla 34 0.06%
164 glas 34 0.06%
165 helt 33 0.06%
166 består 33 0.06%
167 apparat 33 0.06%
168 måste 33 0.06%
169 bör 33 0.06%
170 glaset 33 0.06%
171 särdeles 32 0.05%
172 hon 32 0.05%
173 sätter 32 0.05%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Cover of Vetenskapliga tidsfördrif by Gaston Tissandier

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.