Tidsbilder ur Stockholmslifvet by Claës Lundin : Difficulty Assessment for Swedish Learners

How difficult is Tidsbilder ur Stockholmslifvet for Swedish learners? We have performed multiple tests on its full text (freely available here) of approximately 54,748, crunched all the numbers for you and present the results below.

Read the Full Text Now for Free!

Difficulty Assessment Summary

We have estimated Tidsbilder ur Stockholmslifvet to have a difficulty score of 74. Here're its scores:

Measure		Score
	easy difficult	(1 - 100)
Overall Difficulty	74%	74
Vocabulary Difficulty	89%	89
Grammatical Difficulty	60%	60

Vocabulary Difficulty: Breakdown

89%

Vocabulary difficulty: 89%

This score has been calculated based on frequency vocabulary (the top most frequently used words in Swedish). It combines various measures of Tidsbilder ur Stockholmslifvet's text analyzed in terms of frequency vocabulary: a plain vocabulary score, frequency-weighted vocabulary score, banded frequency vocabulary scores based on vocabulary of the text falling in the top 1,000 or 2,000 most frequent words, etc. Here's a further breakdown of how often the top most frequently used words in Swedish appear in the full text of Tidsbilder ur Stockholmslifvet:

Vocabulary difficulty breakdown for Tidsbilder ur Stockholmslifvet: a test for Swedish top frequency vocabulary

We have also calculated the following approximate data on the vocabulary in Tidsbilder ur Stockholmslifvet:

Measure	Score
Measure	Score
Number of words	54,748
Number of unique words	11,535
Number of recognized words for names/places/other entities	3,220
Number of very rare non-entity words	4,058
Number of sentences	9,055
Average number of words/sentence	6

There is some research suggesting that that you need to know about 98% of a text's vocabulary in order to be able to infer the meaning of unknown words when reading. If true, this means that you would need to know around 11,304 words (where all the forms of the word are still counted as unique words) in Swedish to be able to read Tidsbilder ur Stockholmslifvet without a dictionary and fully understand it.

Grammatical Difficulty: Breakdown

60%

Grammatical difficulty: 60%

Here is the further grammatical comparison on this text. You can find an explanation of all these scores below.

Measure	Score
Measure	Score
Automated Readability Index	6
Coleman-Liau Index	9
Type/Token Ratio (TTR)	0.210693
Root type/Token Ratio (RTTR)	0.00000384841
Corrected type/Token Ratio (CTTR)	0.0000019242
MTLD Index	77
HDD Index	69
Yule's I Index	80
Lexical Diversity Index (MTLD + HD-D + Yule's I)	75

The type-token ratio (TTR) of Tidsbilder ur Stockholmslifvet is 0.210693. The TTR is the most basic measure of lexical diversity. To calculate it, we divide the number of unique words by the number of words in the text. For example, for this text, the number of unique words is 11,535, while the number of words is 54,748, so the TTR is 11,535 / 54,748 = 0.210693. However, the TTR is a very crude measure, as it is extremely dependent on text length. The longer the text, the lower the TTR is usually going to be, since common words tend to often repeat. Especially since the number of words in this text is more than 1,000, the TTR is not likely to give an accurate measure.

The root type-token ratio (RTTR) and corrected type-token ratio (CTTR) are measures which were suggested by researchers to partially address the problem of TTR's variance on text length. In the RTTR, the number of unique words is divided by a square of the number of words (therefore, 11,535 / (54,748 * 54,748) = 0.00000384841), while in CTTR, it is divided by a square of the number of words, multiplied twice 11,535 / 2 * (54,748 * 54,748) = 0.0000019242). However, these measures are not as easily readable, and also there is a growing body of research asserting that CTTR and RTTR do not effectively address the problems of text length. Therefore, while we do provide the full text's TTR, RTTR and CTTR on this page, these fiqures do not form part of our final calculations.

The Automated Readability Index (ARI) is one readability measure that has been developed by researchers over the years. The formula for calculating the ARI is as follows:
Formula for calculating the Automated Readability Index

The ARI should compute a reading level approximately corresponding to the reader's grade level (assuming the reader undertakes formal education). Thus, for example, a value of 1 is kindergarten level, while a value of 12 or 13 is the last year of school, and 14 is a sophomore at college. The current ARI of this text is 6, making it understandable for 6-grade students at their expected level of education.

The Coleman Liau Index (CLI) is a similar index designed by Meri Coleman and T. L. Liau, and it is supposed to compute the grade level of the reader (thus, for example, sophomore level material would be around grade 14, or year 14 of formal education, while kindergarten / primary school level material would be close to grade 1 in the CLI). The CLI is usually slightly higher than the ARI. The CLI is computed with this formula:
Formula for calculating the Coleman-Liau Readability Index

It is notable that other indexes exist, such as the Flesch-Kincaid Reading Ease, Gunning-Fog Score, and others, but we have chosen not to include them, since, contrary to the ARI and CLI, such other indexes are based on a syllable count and therefore arguably only work for English and not Swedish.

We compute a further compound lexical diversity index, which should range from 1 to a 100 (with the standard deviation being around 10, and its average value being around 50) - it is 75 in the present case. The compound lexical diversity index consists of the following indexes, averaged out (and also provided in the table above):

the Measure of Textual Lexical Diversity (MTLD) index - a measure which is based on computing the TTR for increasingly larger parts of the text until the TTR drops below a certain threshold point (around 0.7 in our case) - in which case, the TTR is reset, and the overall counter is increased; the counter is at the end divided by the number of words in text; as a result, the MTLD does not significantly vary by text length;
the Yule's I index (based on Yule's K characteristic inverted) - an index based on the work of the statistician G.U. Yule, who published his index of Frequency Vocabulary in his paper "The statistical study of literary vocabulary"; Yule's I takes into account the number of words in the text, and a compound summed measure of word frequency;
the Hypergeometric Distribution D (HD-D) index (based on vocd) - an index which assesses the contribution of each word to the diversity of the text; to calculate such contributions, a hypergeometric distribution is used to compute probabilities of each word appearing in word samples extracted from the text; then such distributions are divided by sample sizes and added up;

Our overall measure of grammatical diversity is based on a combination of the compound lexical diversity index (which includes the MTLD, Yule's I and HD-D indexes), the ARI and CLI, all normalized and given certain weight. The score should normally range from 1 to 100. In this case, the score is 60.

Other Information about Tidsbilder ur Stockholmslifvet by Claës Lundin

We provide you a sample of the text below, however, the full text of the Tidsbilder ur Stockholmslifvet is also available free of charge on our website.

Sample of text:

Jonas Berg understödde Ulla och tog sig friheten att hänsyfta på skänken af de hjärtstyrkande vederkvickelsestunderna. Den gudfruktiga madamen var dock lika obeveklig ända till dess hennes herre och man infann sig i kammaren och hade med sig den unge magistern Florentinus, komministersadjunkten, hvilken förklarade, att ett så oskyldigt nöje som att se ett kungligt intåg icke borde förmenas någon. Han trodde med detta yttrande göra jungfru Ulla sig bevågen, och hon skänkte honom verkligen en tacksam blick, men därvid stannade det. Några vänliga ord kunde Scharff icke aflocka henne. Klockar Berg själf förklarade med myndig ton, att Ulla kunde gärna begifva sig till tukthusskrifvarens ...

Top most frequently used words in Tidsbilder ur Stockholmslifvet by Claës Lundin*

Position	Word	Repetitions	Part of all words
Position	Word	Repetitions	Part of all words
1	och	2,006	3.66%
2	en	823	1.5%
3	på	746	1.36%
4	af	743	1.36%
5	att	703	1.28%
6	som	697	1.27%
7	var	654	1.19%
8	det	649	1.19%
9	med	631	1.15%
10	den	630	1.15%
11	till	530	0.97%
12	för	510	0.93%
13	men	476	0.87%
14	sig	453	0.83%
15	han	433	0.79%
16	de	394	0.72%
17	icke	369	0.67%
18	hade	368	0.67%
19	ett	343	0.63%
20	man	286	0.52%
21	då	276	0.5%
22	så	269	0.49%
23	jag	259	0.47%
24	är	257	0.47%
25	om	255	0.47%
26	vid	229	0.42%
27	sin	195	0.36%
28	år	194	0.35%
29	ej	193	0.35%
30	där	182	0.33%
31	mycket	164	0.3%
32	hans	152	0.28%
33	hon	147	0.27%
34	hvilken	143	0.26%
35	från	140	0.26%
36	nu	137	0.25%
37	än	132	0.24%
38	under	130	0.24%
39	något	127	0.23%
40	ännu	123	0.22%
41	har	122	0.22%
42	ha	119	0.22%
43	samt	116	0.21%
44	skulle	115	0.21%
45	sedan	112	0.2%
46	honom	111	0.2%
47	också	110	0.2%
48	vi	103	0.19%
49	hos	100	0.18%
50	någon	99	0.18%
51	ut	98	0.18%
52	eller	97	0.18%
53	kunde	96	0.18%
54	voro	94	0.17%
55	vara	93	0.17%
56	in	93	0.17%
57	mig	93	0.17%
58	äfven	92	0.17%
59	andra	88	0.16%
60	sina	88	0.16%
61	dock	87	0.16%
62	upp	85	0.16%
63	några	84	0.15%
64	inte	83	0.15%
65	henne	82	0.15%
66	åt	82	0.15%
67	sitt	81	0.15%
68	öfver	80	0.15%
69	alla	79	0.14%
70	få	79	0.14%
71	väl	78	0.14%
72	redan	76	0.14%
73	här	75	0.14%
74	hvad	75	0.14%
75	utan	74	0.14%
76	nog	73	0.13%
77	efter	72	0.13%
78	gick	72	0.13%
79	tid	72	0.13%
80	Stockholm	71	0.13%
81	många	71	0.13%
82	varit	71	0.13%
83	gamla	69	0.13%
84	dem	67	0.12%
85	oss	67	0.12%
86	allt	67	0.12%
87	aldrig	66	0.12%
88	såsom	65	0.12%
89	tiden	63	0.12%
90	kom	63	0.12%
91	förut	58	0.11%
92	se	57	0.1%
93	vardt	56	0.1%
94	sade	56	0.1%
95	fru	56	0.1%
96	stora	56	0.1%
97	par	56	0.1%
98	hvilka	56	0.1%
99	annan	55	0.1%
100	samma	54	0.1%
101	ganska	54	0.1%
102	hela	53	0.1%
103	du	53	0.1%
104	kanske	53	0.1%
105	mot	52	0.09%
106	såg	52	0.09%
107	kan	52	0.09%
108	själf	52	0.09%
109	skall	51	0.09%
110	bland	49	0.09%
111	detta	49	0.09%
112	två	49	0.09%
113	stor	49	0.09%
114	mera	49	0.09%
115	gång	48	0.09%
116	hvilket	47	0.09%
117	huset	46	0.08%
118	min	46	0.08%
119	Widstrand	46	0.08%
120	tog	45	0.08%
121	fick	45	0.08%
122	vore	45	0.08%
123	Ulla	44	0.08%
124	göra	44	0.08%
125	genom	44	0.08%
126	blott	43	0.08%
127	riktigt	43	0.08%
128	alltid	42	0.08%
129	hus	42	0.08%
130	mamsell	42	0.08%
131	teatern	42	0.08%
132	första	42	0.08%
133	alldeles	41	0.07%
134	ville	41	0.07%
135	lika	41	0.07%
136	annat	41	0.07%
137	litet	41	0.07%
138	äro	40	0.07%
139	gjorde	40	0.07%
140	talade	40	0.07%
141	nya	39	0.07%
142	vår	39	0.07%
143	stod	39	0.07%
144	hvars	38	0.07%
145	både	38	0.07%
146	förklarade	37	0.07%
147	kungliga	37	0.07%
148	just	37	0.07%
149	komma	36	0.07%
150	ingen	36	0.07%
151	fram	36	0.07%
152	ned	36	0.07%
153	endast	35	0.06%
154	bror	35	0.06%
155	fann	35	0.06%
156	därefter	35	0.06%
157	åter	35	0.06%
158	ofta	34	0.06%
159	senare	34	0.06%
160	mellan	34	0.06%
161	ty	34	0.06%
162	denna	34	0.06%
163	mindre	33	0.06%
164	mer	33	0.06%
165	liten	32	0.06%
166	gå	32	0.06%
167	unga	32	0.06%
168	ung	32	0.06%
169	fastän	32	0.06%
170	likväl	31	0.06%
171	början	31	0.06%
172	måste	31	0.06%
173	talet	30	0.05%
174	långt	30	0.05%
175	snart	30	0.05%
176	när	30	0.05%
177	gjort	30	0.05%
178	hennes	30	0.05%
179	svenska	30	0.05%
180	gaf	30	0.05%
181	tycktes	29	0.05%

This list excludes punctuation or single-letter words, also some different-case repeats of the same words.

If you think the text would be accessible to you, you can read it on our site (click on the cover to access):

Other resources and languages

If you like this analysis, you should have a look at out our lists of Swedish short stories and Swedish books.

If you like literature as a means to learn languages - please take a look at our project Interlinear Books. We even have a Swedish Interlinear book available for purchase.