Den olycklige Henry Percy by Mathilda Malling : Difficulty Assessment for Swedish Learners

How difficult is Den olycklige Henry Percy for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 11,044, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Den olycklige Henry Percy to have a difficulty score of 71. Here're its scores:

Measure		Score
	easy difficult	(1 - 100)
Overall Difficulty	71%	71
Vocabulary Difficulty	84%	84
Grammatical Difficulty	58%	58

Vocabulary Difficulty: Breakdown

84%

Vocabulary difficulty: 84%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Den olycklige Henry Percy's text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Den olycklige Henry Percy:

Vocabulary difficulty breakdown for Den olycklige Henry Percy: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Den olycklige Henry Percy:

Measure	Score
Measure	Score
Number of words	11,044
Number of unique words	3,556
Number of recognized words for names/places/other entities	713
Number of very rare non-entity words	658
Number of sentences	2,053
Average number of words/sentence	5

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 3,484 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Den olycklige Henry Percy without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

58%

Grammatical difficulty: 58%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure	Score
Measure	Score
Automated Readability Index	5
Coleman-Liau Index	8
Type/Token Ratio (TTR)	0.321985
Root type/Token Ratio (RTTR)	0.0000291547
Corrected type/Token Ratio (CTTR)	0.0000145774
MTLD Index	80
HDD Index	68
Yule's I Index	80
Lexical Diversity Index (MTLD + HD-D + Yule's I)	76

The type-token ratio (TTR) of Den olycklige Henry Percy is 0.321985. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 3,556, while the number of words is 11,044, so the TTR is 3,556 / 11,044 = 0.321985. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 3,556 / (11,044 * 11,044) = 0.0000291547), while in CTTR, it is divided by a square of the number of words, multiplied twice 3,556 / 2 * (11,044 * 11,044) = 0.0000145774). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 5, making it understandable for 5-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 76 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 58.

Other Information about Den olycklige Henry Percy by Mathilda Malling

We provide you a sample of the text below, however, the full text of the Den olycklige Henry Percy is also available free of charge on our website.

Sample of text:

Ty genom sin ytterliga och oförlåtliga oklokhet kommer han naturligtvis att förslösa allt hvad hans värdiga förfäder omsorgsfullt och arbetsamt ha samlat ihop och med ära behållit. (Historien förmäler intet om, hvad den unge ranke herrn framför honom med de trotsigt nedslagna ögonen här tänkte om sin far, på sin tid kallad ”the Magnificent” eller rättare ...

Top most frequently used words in Den olycklige Henry Percy by Mathilda Malling*

Position	Word	Repetitions	Part of all words
Position	Word	Repetitions	Part of all words
1	och	404	3.66%
2	att	221	2%
3	af	163	1.48%
4	han	162	1.47%
5	den	144	1.3%
6	en	140	1.27%
7	som	136	1.23%
8	på	125	1.13%
9	med	125	1.13%
10	till	118	1.07%
11	det	107	0.97%
12	sin	107	0.97%
13	för	102	0.92%
14	de	96	0.87%
15	sig	86	0.78%
16	hans	79	0.72%
17	honom	70	0.63%
18	så	70	0.63%
19	var	65	0.59%
20	om	61	0.55%
21	har	61	0.55%
22	hon	58	0.53%
23	Percy	54	0.49%
24	ett	54	0.49%
25	eller	48	0.43%
26	vid	42	0.38%
27	Northumberland	41	0.37%
28	henne	40	0.36%
29	Anne	39	0.35%
30	ej	39	0.35%
31	hade	39	0.35%
32	då	39	0.35%
33	från	37	0.34%
34	nu	37	0.34%
35	hvilken	37	0.34%
36	sitt	36	0.33%
37	är	36	0.33%
38	efter	36	0.33%
39	icke	36	0.33%
40	denna	35	0.32%
41	under	34	0.31%
42	Henry	34	0.31%
43	man	31	0.28%
44	varit	29	0.26%
45	sina	29	0.26%
46	hennes	28	0.25%
47	ha	26	0.24%
48	Henrik	25	0.23%
49	unge	24	0.22%
50	ännu	24	0.22%
51	alltid	24	0.22%
52	and	24	0.22%
53	kungen	23	0.21%
54	Wolsey	22	0.2%
55	öfver	22	0.2%
56	gång	22	0.2%
57	jag	22	0.2%
58	the	22	0.2%
59	Lord	22	0.2%
60	än	22	0.2%
61	när	20	0.18%
62	blef	20	0.18%
63	kunde	20	0.18%
64	Lady	19	0.17%
65	redan	19	0.17%
66	någon	19	0.17%
67	tid	19	0.17%
68	detta	19	0.17%
69	alla	19	0.17%
70	lif	19	0.17%
71	Boleyn	19	0.17%
72	London	18	0.16%
73	genom	18	0.16%
74	själf	18	0.16%
75	of	17	0.15%
76	kungens	17	0.15%
77	såsom	17	0.15%
78	år	17	0.15%
79	jarlen	16	0.14%
80	Men	16	0.14%
81	Mary	16	0.14%
82	hur	16	0.14%
83	allt	15	0.14%
84	kan	15	0.14%
85	andra	15	0.14%
86	blott	15	0.14%
87	Sir	15	0.14%
88	säkert	14	0.13%
89	där	14	0.13%
90	emellertid	14	0.13%
91	skulle	14	0.13%
92	utan	14	0.13%
93	all	14	0.13%
94	vara	14	0.13%
95	hela	13	0.12%
96	dem	13	0.12%
97	VIII	13	0.12%
98	kung	13	0.12%
99	också	13	0.12%
100	tyckes	13	0.12%
101	kardinalen	13	0.12%
102	dessa	13	0.12%
103	något	12	0.11%
104	ofta	12	0.11%
105	åt	12	0.11%
106	kanske	12	0.11%
107	inför	12	0.11%
108	vi	12	0.11%
109	hvilka	12	0.11%
110	mer	12	0.11%
111	England	11	0.1%
112	kr	11	0.1%
113	Mistress	11	0.1%
114	in	11	0.1%
115	inte	11	0.1%
116	upp	11	0.1%
117	helt	11	0.1%
118	aldrig	11	0.1%
119	far	11	0.1%
120	mig	11	0.1%
121	Harry	11	0.1%
122	Warden	11	0.1%
123	emot	11	0.1%
124	ingen	10	0.09%
125	lika	10	0.09%
126	dock	10	0.09%
127	mot	10	0.09%
128	Percys	10	0.09%
129	ur	10	0.09%
130	många	10	0.09%
131	mycket	10	0.09%
132	hus	10	0.09%
133	gick	10	0.09%
134	mest	10	0.09%
135	voro	10	0.09%
136	Wressill	10	0.09%
137	kort	10	0.09%
138	måste	10	0.09%
139	nästan	10	0.09%
140	få	9	0.08%
141	mellan	9	0.08%
142	ord	9	0.08%
143	hvad	9	0.08%
144	to	9	0.08%
145	nog	9	0.08%
146	kardinalens	9	0.08%
147	göra	9	0.08%
148	eget	9	0.08%
149	gränsen	9	0.08%
150	senare	9	0.08%
151	mera	9	0.08%
152	deras	8	0.07%
153	stånd	8	0.07%
154	del	8	0.07%
155	naturligtvis	8	0.07%
156	godt	8	0.07%
157	stora	8	0.07%
158	namn	8	0.07%
159	egen	8	0.07%
160	Norfolk	8	0.07%
161	tiden	8	0.07%
162	genast	8	0.07%
163	här	8	0.07%
164	möjligt	8	0.07%
165	samt	8	0.07%
166	min	8	0.07%
167	enligt	8	0.07%
168	drottning	8	0.07%
169	trots	8	0.07%
170	sålunda	8	0.07%
171	beständigt	8	0.07%
172	fick	8	0.07%
173	his	8	0.07%
174	hvars	8	0.07%
175	gjort	8	0.07%
176	gjorde	7	0.06%
177	äfven	7	0.06%
178	första	7	0.06%
179	jarlens	7	0.06%
180	samma	7	0.06%
181	ställning	7	0.06%
182	that	7	0.06%
183	död	7	0.06%
184	liten	7	0.06%
185	hand	7	0.06%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.