Musjikerna med flera berättelser by Anton Tjechov : Difficulty Assessment for Swedish Learners

How difficult is Musjikerna med flera berättelser for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 54,761, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Musjikerna med flera berättelser to have a difficulty score of 56. Here're its scores:

Measure		Score
	easy difficult	(1 - 100)
Overall Difficulty	56%	56
Vocabulary Difficulty	64%	64
Grammatical Difficulty	49%	49

Vocabulary Difficulty: Breakdown

64%

Vocabulary difficulty: 64%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Musjikerna med flera berättelser's text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Musjikerna med flera berättelser:

Vocabulary difficulty breakdown for Musjikerna med flera berättelser: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Musjikerna med flera berättelser:

Measure	Score
Measure	Score
Number of words	54,761
Number of unique words	8,370
Number of recognized words for names/places/other entities	2,201
Number of very rare non-entity words	1,972
Number of sentences	9,106
Average number of words/sentence	6

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 8,202 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Musjikerna med flera berättelser without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

49%

Grammatical difficulty: 49%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure	Score
Measure	Score
Automated Readability Index	3
Coleman-Liau Index	6
Type/Token Ratio (TTR)	0.152846
Root type/Token Ratio (RTTR)	0.00000279115
Corrected type/Token Ratio (CTTR)	0.00000139557
MTLD Index	65
HDD Index	66
Yule's I Index	70
Lexical Diversity Index (MTLD + HD-D + Yule's I)	67

The type-token ratio (TTR) of Musjikerna med flera berättelser is 0.152846. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 8,370, while the number of words is 54,761, so the TTR is 8,370 / 54,761 = 0.152846. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 8,370 / (54,761 * 54,761) = 0.00000279115), while in CTTR, it is divided by a square of the number of words, multiplied twice 8,370 / 2 * (54,761 * 54,761) = 0.00000139557). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 3, making it understandable for 3-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 67 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 49.

Other Information about Musjikerna med flera berättelser by Anton Tjechov

We provide you a sample of the text below, however, the full text of the Musjikerna med flera berättelser is also available free of charge on our website.

Sample of text:

Det föreföll mig obegripligt, ty värden hade för länge sedan gått och lagt sig, och köpmannen och jag voro de enda nattgästerna. .. Hvad kunde det vara? ... Jag började ana oråd och smög mig närmare ljuset... Heliga Guds moder förbarma dig! ... Alldeles invid jorden fick jag se ett litet fönster med järngaller för... Jag lade mig ned på marken för att kunna se in . .. och jag fick se något, som kom mig att skaka i hela kroppen ...» Kirjocha försökte att utan buller lägga en knippa gräs på elden. Gubben väntade till gräset slutat knastra och spraka och fortsatte sedan: »Jag såg en stor, mörk källare ... ...

Top most frequently used words in Musjikerna med flera berättelser by Anton Tjechov*

Position	Word	Repetitions	Part of all words
Position	Word	Repetitions	Part of all words
1	och	2,611	4.77%
2	att	922	1.68%
3	på	869	1.59%
4	han	858	1.57%
5	en	820	1.5%
6	som	812	1.48%
7	det	730	1.33%
8	med	638	1.17%
9	af	510	0.93%
10	den	502	0.92%
11	sig	497	0.91%
12	till	497	0.91%
13	de	473	0.86%
14	var	416	0.76%
15	är	403	0.74%
16	om	365	0.67%
17	för	353	0.64%
18	ett	319	0.58%
19	så	265	0.48%
20	honom	265	0.48%
21	hade	248	0.45%
22	men	247	0.45%
23	jag	244	0.45%
24	inte	219	0.4%
25	upp	210	0.38%
26	öfver	185	0.34%
27	hon	182	0.33%
28	då	171	0.31%
29	man	170	0.31%
30	dem	168	0.31%
31	sade	168	0.31%
32	ut	167	0.3%
33	mycket	159	0.29%
34	skulle	156	0.28%
35	från	155	0.28%
36	icke	154	0.28%
37	nu	152	0.28%
38	Jegoruschka	151	0.28%
39	såg	145	0.26%
40	utan	144	0.26%
41	vid	144	0.26%
42	något	143	0.26%
43	har	141	0.26%
44	hans	141	0.26%
45	alla	141	0.26%
46	ej	140	0.26%
47	fram	133	0.24%
48	började	133	0.24%
49	ned	128	0.23%
50	allt	126	0.23%
51	mig	122	0.22%
52	du	122	0.22%
53	där	119	0.22%
54	gick	119	0.22%
55	sin	118	0.22%
56	någon	118	0.22%
57	ni	117	0.21%
58	in	116	0.21%
59	kunde	112	0.2%
60	ha	111	0.2%
61	andra	107	0.2%
62	Gud	106	0.19%
63	se	106	0.19%
64	under	104	0.19%
65	sedan	102	0.19%
66	mot	100	0.18%
67	hur	100	0.18%
68	sina	97	0.18%
69	än	96	0.18%
70	steppen	93	0.17%
71	eller	91	0.17%
72	här	89	0.16%
73	få	89	0.16%
74	gå	88	0.16%
75	åt	87	0.16%
76	kan	86	0.16%
77	vara	83	0.15%
78	alldeles	83	0.15%
79	min	82	0.15%
80	ännu	81	0.15%
81	kom	79	0.14%
82	bort	78	0.14%
83	ur	78	0.14%
84	voro	77	0.14%
85	efter	74	0.14%
86	vi	74	0.14%
87	detta	71	0.13%
88	skall	71	0.13%
89	hvad	69	0.13%
90	hela	68	0.12%
91	omkring	67	0.12%
92	mer	67	0.12%
93	ser	66	0.12%
94	tillbaka	65	0.12%
95	vagnen	65	0.12%
96	Vasiljeff	64	0.12%
97	åter	64	0.12%
98	Ja	64	0.12%
99	ingen	62	0.11%
100	måste	61	0.11%
101	människor	60	0.11%
102	också	60	0.11%
103	frågade	60	0.11%
104	stod	59	0.11%
105	blef	58	0.11%
106	gubben	58	0.11%
107	äro	57	0.1%
108	liten	57	0.1%
109	henne	56	0.1%
110	aldrig	56	0.1%
111	dig	55	0.1%
112	fick	55	0.1%
113	komma	55	0.1%
114	ansikte	54	0.1%
115	sitt	53	0.1%
116	går	52	0.09%
117	dessa	52	0.09%
118	samma	52	0.09%
119	ropade	51	0.09%
120	två	51	0.09%
121	tog	51	0.09%
122	röst	51	0.09%
123	några	51	0.09%
124	er	50	0.09%
125	när	49	0.09%
126	oss	48	0.09%
127	hufvudet	48	0.09%
128	ju	47	0.09%
129	byn	47	0.09%
130	stund	46	0.08%
131	stora	46	0.08%
132	vägen	46	0.08%
133	Olga	46	0.08%
134	låg	46	0.08%
135	satt	45	0.08%
136	blifvit	44	0.08%
137	kände	44	0.08%
138	varit	44	0.08%
139	kunna	44	0.08%
140	helt	43	0.08%
141	säger	43	0.08%
142	enda	42	0.08%
143	bredvid	42	0.08%
144	göra	42	0.08%
145	länge	41	0.07%
146	hem	41	0.07%
147	Marja	41	0.07%
148	ville	41	0.07%
149	vill	41	0.07%
150	fader	41	0.07%
151	litet	40	0.07%
152	Malachin	40	0.07%
153	händerna	40	0.07%
154	tre	40	0.07%
155	ögonen	40	0.07%
156	äfven	39	0.07%
157	annat	39	0.07%
158	Christofor	39	0.07%
159	bli	38	0.07%
160	endast	38	0.07%
161	blir	38	0.07%
162	dricka	38	0.07%
163	stugan	38	0.07%
164	vattnet	38	0.07%
165	därför	38	0.07%
166	Deniska	38	0.07%
167	mina	37	0.07%
168	får	37	0.07%
169	Pantelei	37	0.07%
170	steg	37	0.07%
171	tänkte	36	0.07%
172	står	36	0.07%
173	gjorde	36	0.07%
174	snart	36	0.07%
175	himlen	36	0.07%
176	Kusmitschoff	35	0.06%
177	säga	35	0.06%
178	Moisej	35	0.06%
179	Herre	35	0.06%
180	ingenting	35	0.06%
181	äta	35	0.06%
182	stor	35	0.06%
183	själf	34	0.06%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.