Masskultur by Vitalis Norström : Difficulty Assessment for Swedish Learners

How difficult is Masskultur for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 42,639, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Masskultur to have a difficulty score of 73. Here're its scores:

Measure		Score
	easy difficult	(1 - 100)
Overall Difficulty	73%	73
Vocabulary Difficulty	88%	88
Grammatical Difficulty	58%	58

Vocabulary Difficulty: Breakdown

88%

Vocabulary difficulty: 88%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Masskultur's text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Masskultur:

Vocabulary difficulty breakdown for Masskultur: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Masskultur:

Measure	Score
Measure	Score
Number of words	42,639
Number of unique words	8,906
Number of recognized words for names/places/other entities	459
Number of very rare non-entity words	3,208
Number of sentences	6,936
Average number of words/sentence	6

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 8,727 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Masskultur without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

58%

Grammatical difficulty: 58%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure	Score
Measure	Score
Automated Readability Index	7
Coleman-Liau Index	11
Type/Token Ratio (TTR)	0.20887
Root type/Token Ratio (RTTR)	0.00000489856
Corrected type/Token Ratio (CTTR)	0.00000244928
MTLD Index	70
HDD Index	64
Yule's I Index	69
Lexical Diversity Index (MTLD + HD-D + Yule's I)	68

The type-token ratio (TTR) of Masskultur is 0.20887. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 8,906, while the number of words is 42,639, so the TTR is 8,906 / 42,639 = 0.20887. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 8,906 / (42,639 * 42,639) = 0.00000489856), while in CTTR, it is divided by a square of the number of words, multiplied twice 8,906 / 2 * (42,639 * 42,639) = 0.00000244928). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 7, making it understandable for 7-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 68 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 58.

Other Information about Masskultur by Vitalis Norström

We provide you a sample of the text below, however, the full text of the Masskultur is also available free of charge on our website.

Sample of text:

Det mänskliga väsendets enhet spränges i »ståndpunkter» och »riktningar». Anslutningen till sådana sker aldrig utan påkostande offer af en mångfald omedelbara känslor af hög betydelse för lif och lycka och af harmoniskt förhållande till natur och människoomgifning. Med partitagandet följer en söndring inåt och utåt, som faktiskt ofta gestaltar sig till ett verkligt nödläge, ja, ibland förvandlar hela lifvet till en tryckande börda. Man kan dock inte på längden värja sig för den insikten, att i partierna — ordet här taget i vidsträcktare mening — ej genomgående står rätt mot orätt utan på det hela taget rätt emot rätt, ett stycke rätt emot ett annat. Man måste inse, att all ...

Top most frequently used words in Masskultur by Vitalis Norström*

Position	Word	Repetitions	Part of all words
Position	Word	Repetitions	Part of all words
1	och	1,719	4.03%
2	att	891	2.09%
3	den	772	1.81%
4	som	764	1.79%
5	en	735	1.72%
6	det	666	1.56%
7	af	657	1.54%
8	till	522	1.22%
9	för	518	1.21%
10	på	452	1.06%
11	med	450	1.06%
12	är	398	0.93%
13	sig	365	0.86%
14	ett	324	0.76%
15	icke	306	0.72%
16	vi	280	0.66%
17	de	269	0.63%
18	om	219	0.51%
19	eller	213	0.5%
20	så	195	0.46%
21	Men	189	0.44%
22	oss	178	0.42%
23	kan	174	0.41%
24	än	173	0.41%
25	utan	167	0.39%
26	vår	165	0.39%
27	denna	164	0.38%
28	man	163	0.38%
29	allt	159	0.37%
30	detta	142	0.33%
31	sin	140	0.33%
32	har	140	0.33%
33	såsom	134	0.31%
34	blott	130	0.3%
35	från	119	0.28%
36	öfver	118	0.28%
37	måste	115	0.27%
38	inte	112	0.26%
39	dess	111	0.26%
40	moderna	109	0.26%
41	ha	109	0.26%
42	också	108	0.25%
43	mot	104	0.24%
44	kunna	103	0.24%
45	skall	98	0.23%
46	vara	96	0.23%
47	alla	94	0.22%
48	något	91	0.21%
49	själfva	87	0.2%
50	under	81	0.19%
51	hvad	77	0.18%
52	där	74	0.17%
53	genom	74	0.17%
54	nu	74	0.17%
55	dock	74	0.17%
56	lif	73	0.17%
57	mycket	72	0.17%
58	andra	72	0.17%
59	ligger	72	0.17%
60	sitt	71	0.17%
61	rätt	69	0.16%
62	annat	68	0.16%
63	hvilken	68	0.16%
64	mellan	68	0.16%
65	hos	65	0.15%
66	alldeles	65	0.15%
67	då	63	0.15%
68	all	63	0.15%
69	efter	62	0.15%
70	jag	61	0.14%
71	mindre	61	0.14%
72	våra	60	0.14%
73	lifvets	57	0.13%
74	kultur	56	0.13%
75	lifvet	56	0.13%
76	mer	56	0.13%
77	dessa	55	0.13%
78	här	54	0.13%
79	inre	54	0.13%
80	helt	54	0.13%
81	mera	53	0.12%
82	göra	52	0.12%
83	långt	52	0.12%
84	vårt	51	0.12%
85	dem	51	0.12%
86	ingen	51	0.12%
87	tid	51	0.12%
88	samma	51	0.12%
89	egen	51	0.12%
90	hela	51	0.12%
91	mening	51	0.12%
92	åt	51	0.12%
93	får	50	0.12%
94	aldrig	50	0.12%
95	ej	50	0.12%
96	äro	50	0.12%
97	blir	50	0.12%
98	själf	49	0.11%
99	därför	49	0.11%
100	vill	48	0.11%
101	alltid	46	0.11%
102	hvilka	46	0.11%
103	någon	46	0.11%
104	sådan	46	0.11%
105	ut	46	0.11%
106	skulle	45	0.11%
107	samhället	45	0.11%
108	ur	45	0.11%
109	ofta	44	0.1%
110	gamla	44	0.1%
111	deras	44	0.1%
112	just	43	0.1%
113	står	43	0.1%
114	arbete	43	0.1%
115	fram	42	0.1%
116	sätt	42	0.1%
117	nya	42	0.1%
118	sina	42	0.1%
119	lika	42	0.1%
120	vid	40	0.09%
121	kulturen	40	0.09%
122	verkligen	38	0.09%
123	bli	38	0.09%
124	äfven	38	0.09%
125	går	37	0.09%
126	ja	37	0.09%
127	väl	37	0.09%
128	stället	37	0.09%
129	längre	37	0.09%
130	inom	36	0.08%
131	annan	36	0.08%
132	få	35	0.08%
133	in	35	0.08%
134	ju	35	0.08%
135	yttre	34	0.08%
136	andliga	34	0.08%
137	ingalunda	34	0.08%
138	se	33	0.08%
139	kraft	32	0.08%
140	förr	32	0.08%
141	skola	32	0.08%
142	sådant	32	0.08%
143	hur	32	0.08%
144	sociala	32	0.08%
145	när	32	0.08%
146	lycka	32	0.08%
147	människan	32	0.08%
148	komma	32	0.08%
149	vilja	32	0.08%
150	intet	32	0.08%
151	nog	31	0.07%
152	frihet	31	0.07%
153	hvilket	31	0.07%
154	stora	30	0.07%
155	form	30	0.07%
156	gör	30	0.07%
157	personliga	30	0.07%
158	grad	30	0.07%
159	var	30	0.07%
160	rent	30	0.07%
161	hvarandra	29	0.07%
162	del	29	0.07%
163	kommer	29	0.07%
164	betyder	29	0.07%
165	världen	29	0.07%
166	betydelse	29	0.07%
167	högre	28	0.07%
168	NORSTRÖM	28	0.07%
169	stå	28	0.07%
170	söka	28	0.07%
171	naturen	28	0.07%
172	makt	28	0.07%
173	heller	28	0.07%
174	fråga	28	0.07%
175	riktning	28	0.07%
176	ännu	28	0.07%
177	egentligen	27	0.06%
178	historiska	27	0.06%
179	känna	27	0.06%
180	personlig	27	0.06%
181	gång	27	0.06%
182	strid	27	0.06%
183	natur	26	0.06%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.