Naket o. s. v. by Albert Engström : Difficulty Assessment for Swedish Learners

How difficult is Naket o. s. v. for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 65,319, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Naket o. s. v. to have a difficulty score of 56. Here're its scores:

Measure Score
easy difficult (1 - 100)
Overall Difficulty 56% 56
Vocabulary Difficulty 66% 66
Grammatical Difficulty 46% 46

Vocabulary Difficulty: Breakdown

66%

Vocabulary difficulty: 66%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Naket o. s. v.'s text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Naket o. s. v.:

Vocabulary difficulty breakdown for Naket o. s. v.: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Naket o. s. v.:

Measure Score
Measure Score
Number of words 65,319
Number of unique words 12,351
Number of recognized words for names/places/other entities 2,128
Number of very rare non-entity words 2,007
Number of sentences 10,626
Average number of words/sentence 6

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 12,103 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Naket o. s. v. without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

46%

Grammatical difficulty: 46%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure Score
Measure Score
Automated Readability Index 3
Coleman-Liau Index 6
Type/Token Ratio (TTR) 0.189087
Root type/Token Ratio (RTTR) 0.00000289483
Corrected type/Token Ratio (CTTR) 0.00000144742
MTLD Index 57
HDD Index 63
Yule's I Index 67
Lexical Diversity Index (MTLD + HD-D + Yule's I) 62

The type-token ratio (TTR) of Naket o. s. v. is 0.189087. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 12,351, while the number of words is 65,319, so the TTR is 12,351 / 65,319 = 0.189087. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 12,351 / (65,319 * 65,319) = 0.00000289483), while in CTTR, it is divided by a square of the number of words, multiplied twice 12,351 / 2 * (65,319 * 65,319) = 0.00000144742). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 3, making it understandable for 3-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 62 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

  • the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
  • the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
  • the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 46.

Other Information about Naket o. s. v. by Albert Engström

We provide you a sample of the text below, however, the full text of the Naket o. s. v. is also available free of charge on our website.

Sample of text:

Nu kommer jag till den så kallade moralen i denna historia. Jag skrattade, alla skrattade, ja till och med — o fasa och ve! — fästmön måste dra på mun. Ty handlanden gjorde en så tragikomiskt löjlig figur under sina vansinniga försök att befria sig från sotaren, att man måste skratta — och detta var naturligtvis orätt. Många gånger har jag sörjt över mitt opassande uppträdande vid detta tillfälle. När Brandenburg lett handlanden över ån, gick han drypande våt upp på bron igen och fortsatte den avbrutna dansuppvisningen. Men utom sig av vrede och skam, vrålade handlanden åt sin fästmö: — Å, du skrattade, du mä! ...

Top most frequently used words in Naket o. s. v. by Albert Engström*

Position Word Repetitions Part of all words
Position Word Repetitions Part of all words
1 och 2,488 3.81%
2 en 1,258 1.93%
3 jag 1,165 1.78%
4 att 1,092 1.67%
5 962 1.47%
6 som 941 1.44%
7 det 923 1.41%
8 var 734 1.12%
9 han 722 1.11%
10 av 679 1.04%
11 med 674 1.03%
12 för 617 0.94%
13 den 578 0.88%
14 är 550 0.84%
15 inte 517 0.79%
16 till 516 0.79%
17 om 474 0.73%
18 448 0.69%
19 ett 419 0.64%
20 Men 413 0.63%
21 sig 379 0.58%
22 mig 367 0.56%
23 de 344 0.53%
24 hade 327 0.5%
25 har 320 0.49%
26 min 289 0.44%
27 vi 274 0.42%
28 skulle 255 0.39%
29 där 238 0.36%
30 223 0.34%
31 man 215 0.33%
32 icke 215 0.33%
33 nu 214 0.33%
34 sin 213 0.33%
35 ut 196 0.3%
36 kan 182 0.28%
37 hans 180 0.28%
38 hon 179 0.27%
39 du 179 0.27%
40 honom 176 0.27%
41 kom 157 0.24%
42 något 156 0.24%
43 ha 155 0.24%
44 eller 153 0.23%
45 vara 148 0.23%
46 ty 147 0.23%
47 mycket 143 0.22%
48 ju 138 0.21%
49 efter 137 0.21%
50 vid 136 0.21%
51 när 131 0.2%
52 in 131 0.2%
53 från 128 0.2%
54 sade 118 0.18%
55 väl 117 0.18%
56 åt 116 0.18%
57 än 115 0.18%
58 alla 112 0.17%
59 bara 111 0.17%
60 blev 110 0.17%
61 här 109 0.17%
62 kunde 105 0.16%
63 skall 105 0.16%
64 upp 104 0.16%
65 103 0.16%
66 gick 101 0.15%
67 dem 101 0.15%
68 ska 101 0.15%
69 fick 100 0.15%
70 själv 97 0.15%
71 år 95 0.15%
72 vill 95 0.15%
73 någon 95 0.15%
74 detta 94 0.14%
75 utan 93 0.14%
76 Ja 93 0.14%
77 under 92 0.14%
78 varit 91 0.14%
79 gamla 91 0.14%
80 bli 91 0.14%
81 började 90 0.14%
82 över 89 0.14%
83 aldrig 86 0.13%
84 oss 83 0.13%
85 måste 83 0.13%
86 se 79 0.12%
87 sitt 79 0.12%
88 ur 79 0.12%
89 fast 78 0.12%
90 mitt 78 0.12%
91 voro 77 0.12%
92 vad 77 0.12%
93 göra 76 0.12%
94 ville 75 0.11%
95 gång 73 0.11%
96 sina 72 0.11%
97 72 0.11%
98 allt 72 0.11%
99 såg 70 0.11%
100 nog 69 0.11%
101 denna 69 0.11%
102 litet 68 0.1%
103 sedan 68 0.1%
104 mina 67 0.1%
105 just 67 0.1%
106 gammal 67 0.1%
107 också 67 0.1%
108 66 0.1%
109 ni 65 0.1%
110 går 64 0.1%
111 kanske 63 0.1%
112 många 63 0.1%
113 några 61 0.09%
114 henne 61 0.09%
115 par 60 0.09%
116 fått 60 0.09%
117 äro 58 0.09%
118 hur 58 0.09%
119 genom 54 0.08%
120 andra 54 0.08%
121 tiden 54 0.08%
122 får 53 0.08%
123 Nej 52 0.08%
124 mej 51 0.08%
125 ännu 51 0.08%
126 alltid 51 0.08%
127 sa 50 0.08%
128 50 0.08%
129 Engström 50 0.08%
130 hela 50 0.08%
131 mot 49 0.08%
132 ingen 49 0.08%
133 kommer 49 0.08%
134 endast 47 0.07%
135 stod 47 0.07%
136 fram 47 0.07%
137 två 47 0.07%
138 hos 46 0.07%
139 satt 45 0.07%
140 kunna 45 0.07%
141 låg 45 0.07%
142 hem 45 0.07%
143 Calle 44 0.07%
144 igen 44 0.07%
145 länge 44 0.07%
146 samma 44 0.07%
147 blir 44 0.07%
148 ned 43 0.07%
149 tog 43 0.07%
150 komma 43 0.07%
151 gjorde 43 0.07%
152 deras 42 0.06%
153 blivit 42 0.06%
154 vet 42 0.06%
155 fru 42 0.06%
156 sista 41 0.06%
157 alldeles 41 0.06%
158 stor 40 0.06%
159 ta 40 0.06%
160 mer 40 0.06%
161 fröken 40 0.06%
162 tror 39 0.06%
163 tid 38 0.06%
164 kände 38 0.06%
165 resten 38 0.06%
166 hennes 38 0.06%
167 dessa 38 0.06%
168 verkligen 37 0.06%
169 plötsligt 37 0.06%
170 dess 37 0.06%
171 även 36 0.06%
172 vilken 36 0.06%
173 rätt 36 0.06%
174 bra 35 0.05%
175 först 35 0.05%
176 liv 35 0.05%
177 riktigt 35 0.05%
178 havet 34 0.05%
179 vår 34 0.05%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Cover of Naket o. s. v. by Albert Engström

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.