Lexical Semantic Richness in Poe’s Essays and Short Stories: Comparing Corpora with Word Smith Tools and Range

Edgar Allan Poe’s Essays and Short Stories have been widely analyzed throughout the decades. Previous research confirms an ample use of varied vocabulary in his short stories. Nevertheless, little emphasis has been put on some of his not-so-famous works: his essays. Thus, the main aim of this paper is twofold: on the one hand, we aim at comparing the lexical semantic richness in Poe’s essays and in his short stories; on the other hand, we intend to test the effectiveness of two different analytical tools to check this semantic variation, i.e. WordSmith Tools and Range. In order to achieve these aims, three short stories and two essays by Poe were selected and combined to create two main corpora: one of short stories and one of essays. After separating the corpora intro fragments of 2000 tokens, lexical semantic richness was assessed using the two aforementioned tools. Results show that i) lexical semantic richness is higher in short stories than it is in essays, and ii) both tools have proven to be effective. These results are further discussed and pedagogical applications for language teaching are put forward.


I.Introduction
Lexical semantic richness is a concept which is difficult to be defined. It may include a series of characteristics such as lexical variation, sophistication, and density or originality, among others. In short, the aim of lexical semantic richness is to measure to some extent the vocabulary size and use that a speaker makes of their lexicon. This becomes a relevant parameter since it allows researchers to infer the level of proficiency of a speaker (either L1 or L2/FL) as far as vocabulary is concerned. Measures of lexical richness attempt to quantify the degree to which a writer is using a varied and large vocabulary (Laufer & Nation 1995).
Measuring lexical semantic richness is quite a difficult task, though. The high number of variables and the difficulty of considering them all individually make the statement of the lexical semantic richness in just one item a very hard undertaking.
The main aim of this paper is to test the effectiveness of two different tools when analyzing lexical semantic richness of texts. The two tools that will be taken into account in this paper are WordSmith Tools (Scott, 1998) and Range, developed by Nation (1995), which includes the 34.000 more frequent words in the British National Corpus (BNC). WordSmith Tools and Range are two tools commonly used by linguists when conducting any kind of quantitative analysis. Berber-Sardinha (2000), for instance,employs WordSmith to analyze small corpora. He affirms that WordSmith and KeyWords provide facilities for comparing a study corpus to a reference corpus. On the other hand, Range has been developed by Paul Nation and some studies of lexical richness in L2 using it have been conducted (see Laufer & Nation 1995). Moreover, the 34.000 most common words in English according to the British National Corpus were organized in bands of 1000 words by Nation. Furhtermore, Range, unlike WordSmith, is available to be downloaded for free in his web-site and offers an analysis of the Lexical Frequency Profile (LFP) of works. To test the effectiveness of these two tools, a comparison of the lexical semantic richness between Edgar Allan Poe's short stories and essays will be made. The initial hypothesis is a null one; being the same author, no significant differences are expected to be found when modifying the literary genre.

Sample
The sample used for this study will be made up of three short stories and two essays This lexical semantic variation was analyzed using WordSmith Tools. First, we analyzed the type/token ratio in each of the fragments of both genres and each of the literary works as a whole. Then, we dealt with the progression and the rate at which new types are inserted into the works; to do so, we progressively added new fragments of each genre and noted down the results for each addition. For instance, we analyzed the lexical variation of Short Stories-Fragment 1 and then Short Stories-Fragment 1 + 2, Short Stories-Fragment 1 + 2 + 3, and on. The lexical semantic variation was measured by means of the formula types/tokens*100, that is, number of types divided number of tokens and then multiplied by 100 to obtain a percentage.
Finally, we analyzed the lexical semantic frequency profile of these works with the software Range. This allowed us to look at the proportion of frequency of words used by Poe compared to the British National Corpus. This corpus is organized into the 1000 most frequent words in English, the 2000 most frequent words, 3000, and all the way to the 34.000 most common words. However, for our purposes, we employed a simpler version of Range which only includes the 3.000 most common words.
Range divides the types used by an author into ranges of frequency. With this tool we were able to check whether Poe uses a high or low percentage of 'uncommon' words in English.
It is important to note that proper nouns as well as numbers or symbols were previously removed from the fragments in order to obtain objective results.
By considering these two variables together, an objective overview of Poe's lexical semantic richness was obtained. Results are presented and discussed in the next section.

III.Results and Discussion
After analyzing Poe's full corpus with WordSmith Tools, we found that the shorter the work is the higher the lexical variation is (see figure 1).

Figure 2 -Average ratio type/token for short stories and essays
This analysis shows that the ratio type/token for short stories is higher in the case of short stories, that is, more types are present in Poe's short stories per 2000 words. This could possibly be due to the higher literary focus of short stories. Whereas essays can be considered as semi-plain language discussions on a given topic, short stories require a series of literary techniques and semantic variation in order to create a proper atmosphere. This could even be higher in the case of American Romanticism where very dark and obscure environments were portrayed. This claim, however, should be tested out by comparing works belonging to the American Romanticism with works from other periods.
When considering individual works separately, it is interesting to mention that the ratio type/token of The Fall of the House of Usher (41,03) was higher than the rest of the short stories (36,1) -whose ratio was already higher than the ratio for essays (34,28). The Fall of the House of Usher is probably one of Poe's darkest works and it is full of very detailed descriptions which imply a high mastery of the language. This fact could explain the slight difference in the number of types in this work.
In our next step, we analyzed the progression of the number of types as we kept adding fragments of 2000 tokens to each genre. We observe that the short stories are prone to the creation of more types than the essays (see figure 3). As stated before, this could be explained if we understand essays as texts where few topics are discussed while short stories include conversations, descriptions and different situations which require a higher number of types.

Figure 3 -Type/token progression
Finally, the last aspect that we analyzed is the LFP of both short stories and essays (see tables 4 and 5). In this case, we processed two samples of approximately 12.000 tokens each (one of a compendium of the short stories and the other including fragments of the two essays) with the program Range developed by Nations (2005). The results of this processing show the following data: In both, the short stories and the essays there is a considerable amount of tokens and types which is off the lists, that is, words which are not among the 3.000 most frequent words in English. In the case of the short stories, these words make up the biggest group of types (42% of the types belong to this category). In the case of the essays this group is the second biggest group after the most common words (those included among the 1.000 most common We have seen that these two tools can be useful when determining an author's lexical semantic richness since they provide us with many data such as type/token ratios, and frequency of words. Also, with Range, this frequency is compared against the British National Corpus and the data are organized in frequency bands.
All in all, we can say that Edgar Allan Poe was a writer with a high lexical semantic richness given the fact that he there is an elevated presence of uncommon words among his writings. The initial null hypothesis presented in this paper has been rejected; the lexical semantic richness in short stories is slightly higher than in essays. This is probably due to the fact that short stories require a higher amount of literary vocabulary than essays, which are mere reflections on a certain matter.
Nevertheless, further studies comparing these data to data obtained from other authors are necessary to establish a more exact degree of lexical semantic richness. An analysis of other genres such as poetry might also be a good starting point to compare. Also, it would be interesting for further research to analyze the use of the less frequent words and hapax legomena in Poe's writings in order to obtain a better image of the author's lexical semantic richness; that is, analyzing his lexical semantic competence.
These two tools have been proved to be effective when measuring lexical richness.
Analysis of L2/FL lexical richness could be conducted following the methodology presented in this paper. These analyses would show how the process of lexical enrichment takes place in learners and which words in the frequency bands still need to be learned. In this sense, the collection of real samples of learners' productions would offer clear direction towards the needs for vocabulary teaching in L2/EFL.