วันพฤหัสบดีที่ 18 มกราคม พ.ศ. 2561

Using Corpus Analysis Software to Analyse Specialised Texts


Using Corpus Analysis Software to Analyse Specialised Texts

  
                              


   What is a corpus?

A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides the evidence of how language is used in real situations, from which lexicographers can write accurate and meaningful dictionary entries.
Using the corpus enables lexicographers to examine a word in detail by looking at all the different contexts in which it occurs. Below is a typical way of viewing the results of a search of the corpus, using a display format called KWIC (or ‘key word in context’)
The corpus contains over 2.5 billion words of real 21st-century English; this is the largest lexical corpus in the world. It is not only size that matters, though: it is the size of the corpus coupled with the careful selection and development of its contents which means that it is a resource unlike any other in the world. Moreover, because the corpus is a collection of texts, there are not two billion different words: the humble word ‘the’, the commonest in the written language, accounts for almost 100 million of all the words in the corpus!

Keeping track of our language

Meanings of words and phrases change, and so do spellings, despite the existence of ‘standard’ or ‘correct’ spelling. A strength of the corpus is that it contains not only published works in which the text has been edited (and made to conform to standard spellings and grammar) but also unpublished and unedited writing like emails and blogs. Some of the most inventive uses and deliberate exploitations of language (as well as genuine mistakes) start out in this kind of informal and unselfconscious language, so tracking them is an essential part of tracking the language as a whole.

 

Sources of language corpora
Subscribe to a large corpus provider such as the British National 

Corpus (BNC).

Use web concordancing.

Compile own corpora and analyze data using analysis software

Antconc (for monolingual corpus)

Wordsmith (for monolingual corpus)

Paraconc (for multilingual corpus
 Designing a specialized corpus

           Corpus size

 There are no fixed ruled; depending on research purposes, availability of data and time.
Large, general corpora may be less useful than small, focused corpora if searches are made on context-specific terms.
There are limitations of too small corpora e.g. not enough concepts, terms, or patterns under investigation.
It is preferable to create a monitor or open corpus because specialized words/usage are dynamic.
Text extracts vs
. full texts

Depends on the aim of corpus compilation.
Whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text.Specific sections may be helpful if we are looking for words or phrase under particular content areas or want to create purposeful sub-corpora.Number of texts
  Choices can be made between collect few texts of large size or a number of texts with smaller sizes.
Choices can also be made between selecting texts written by one or two key writers or sources, or texts retrieved from different sources or written by different authors.
Depends on your research focus e.g. to study overall language use or to study idiosyncrasy or linguistic choices preferred by particular writers.
Subject and text type
  Should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.
Texts may come from different subject if the research focus is on the study of particular language features rather than term extraction.
Text types within a specialized subject field may vary fromexpert-to-expert texts to expert-to-non-expert texts, or in other words, from technical to popular texts.
Other considerations

Authorship: Texts written by experts in a field tend to present more reliable and authentic examples of specialized language.
Language: Specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora.
Publication date: Texts should come from recent publications unless queries are made in relation to particular periods of time.

    


      Getting started with Antconc

       Download the latest version of Antconc watch YouTube tutorials from 

 


 

                                   
  

           1.Run the program.      

               2. Open Files (browse and select targeted files) or Open Dir (to select targeted folders)      

                3.Choose the function.     

                4.Clear All Tools and Files before selecting opening new files.      

                5. Save Output to Text File to save output e.g.concordance lines.




0 ความคิดเห็น:

แสดงความคิดเห็น

 

Computer Application in English Language Teaching Template by Ipietoon Cute Blog Design