言語教育学演習2015

ハンドブックを10-11月で読破
CEFR関連の資料を読んで作業開始（12月～）
中間報告（1月始め）
- 道原：逆接の接続詞
  - But, however: パラグラフ単位？
  - どうやってパラグラフ単位の逆接を分析するのか？
  - 逆接：but は文頭では使わない　→　academic writing
- 山崎：NICT JLE の n-gram による接続詞の使用について
  - NS vs NNS: n-gram 比較
  - and, but を含む n-gram がNNSに少ない

発表会（1月下旬 or ２月初旬）

↑

授業の計画（春） †

テキストを用いた専門知識の理解と Sketch Engine というコーパス検索システムの使用法に習熟する。

第1回　イントロダクション：コーパスと言語教材作成
第2回　コーパス言語学の基礎（１）：コーパスの定義と歴史的変遷　(CBLS Unit A1)
- 発表資料（山崎）

第3回　コーパス言語学の基礎（２）：代表性・バランス・標本 (CBLS Unit A2 + B1.2 & B1.3　 + Brown Corpus での AntConc? の実習)

発表資料（廣池）

第4回　コーパス言語学の基礎（３）：Markup & annotation (CBLS Unit A3　& A4)

発表資料（尾崎）

第5回　コーパス言語学の基礎（４）：コーパスと応用分野 (CBLS Unit A10)

10.2-10.3：川本
10.4-10.5：安宅
10.6-10.7：尾崎
10.8：中島

第6回　A10 の残り＋　AntConc? を使った実習

10.9-10.10：廣池
10.11-10.12：山崎
10.13-10.15：ローレンス

第7回　AntConc? を使った実習の続き

正規表現を用いたより複雑な検索
Cluster/ n-gram/ keyword analysis
Tagging データの処理

第8回(6/3)　コーパス言語学の基礎（５）：コーパスと辞書 (CBLS Unit C1) 担当：廣池さん

第9回(6/10)　コーパス言語学の基礎（６）：コーパスと文法研究 (CBLS Unit C2) 　担当：山崎さん
- 発表資料（山崎）
- Antconc：cluster/keyword analysis
- 6/17　教育実習視察が入り休講になります
- 6/24 アジア辞書学会（香港）に出張のため休講になります

第10回(7/1)　コーパス言語学の基礎（７）：コーパスと言語習得研究 (CBLS Unit C3)　担当：尾崎さん
第11回(7/8)　コーパス言語学の基礎（８）：コーパスと翻訳研究 (CBLS Unit C6)　担当：ローレンスくん

以下の内容は夏休み中のゼミ合宿で扱います
第12回　CEFR準拠の学習者データの分析（1）：English Profile の学習者データ分析 の概要
第13回　CEFR準拠の学習者データの分析（2）：エラータグ付与の実際
第14回　CEFR準拠の学習者データの分析（3）：基準特性抽出の研究概観
第15回　CEFR準拠の学習者データの分析（4）：学習者コーパスの種類と基礎的な処理 演習

↑

テキスト †

ISBN	0415286239
書名	Corpus-based language studies : an advanced resource book
著者名	McEnery?, T., Xiao, R., & Tono, Y.
出版社	Routledge
出版年	2006

↑

PDF †

テキストがくるまで以下をダウンロードして利用して下さい：

Chapter A10

↑

READING: †

↑

Discussion questions †

Chapter 1:

What is a corpus? Discuss some common features by comparing different definitions.
Why use computers to study language? What is your intuitive answer to this? What other reasons did you find in the text?
Discuss the use of corpora and the use of intuition. Are they mutually exclusive?
Is corpus linguistics a methodology or a theory?
How different are corpus-based vs. corpus-driven approaches? Can you think of any concrete examples?

Chapter 2:

What is "representativeness"?
What does it mean when Biber says "Representativeness refers to the extent to which a sample includes the full range of variabilityin a population." (p.13)
What are "internal" and "external" criteria used to select texts for a corpus? (p.14)
The authors say that it is problematical to use internal criteria as the primary parameters for the selection of corpus data. Why? (p.14)
Explain what Biber calls a 'cyclical fashion'? (p.14)
Static sample corpora, if resampled, may also allow the study of language change over time. (p.15) How?

What are "general" vs. "specialized" corpora? How is representativeness achieved in these corpora?

How is the acceptable balance of a corpus determined?
Any claim of corpus balance is largely an act of faith. (p.16) What does this mean?
Explain the design of the British National Corpus, using the terms 'domain', 'time', 'medium', 'demographic' and 'context-governed'. How is it balanced?
Elaborate on the following statements:
- Representativeness links to research questions. (p.18)
- Representativeness is a fluid concept. (p.18)

Explain the notion of sampling using the following terms:
- sample/ population/ sampling unit/ sampling frame
What is the difference between 'simple random sampling' and 'stratified random sampling'?
Describe pros and cons of 'full text samples'

Chapter 3

3.2
- What are the three reasons for corpus mark-up? Discuss each case with complete examples.

3.3
- Here, you should at least familiarize yourself with the following schemes:
  - COCOA (dated)
  - TEI (current standard) << website >>
```
   --> header vs. body
Q1. What does the TEI header specify?
Q2. What kind of information is in the TEI body?
```

Corpus Encoding Standard (CES) & XCES << website >>

3.4
- Please read the following webpage for your reference:
  - Introduction to character encoding

Chapter 4

What is corpus annotation and how is it different from corpus mark-up?

4.2
- What are the four advantages for corpus annotation?
- What are some of the criticisms against corpus annotation? What is the authors' response?

4.3
Look at concrete examples for each type of annotation:

POS tagging
- Online tagging system by University of Illinois, Urbana Champaign

Lemmatization
- Online stemmer and lemmatizer (Python NLTK)

Parsing
- Online parser (Stanford)

Semantic annotation
- Lancaster USAS tag

Coreference annotation
- Image of coreference annotation

Pragmatic annotation
- Examples (MICASE pragmatic tags)

Stylistic annotation
- Example: Speech, Thought & Presentation

Error tagging
- Example: Granger (2003)

Problem-oriented annotation

Chapter 5-9

Make a summary on your own

Chapter 10

Summarize the use of corpus data in the following areas briefly

The major areas of linguistics
- lexicographic and lexical studies (10.2)
- grammatical studies (10.3)
- register variation and genre analysis (10.4)
- dialect distinction and language variety (10.5)
- contrastive and translation studies (10.6)
- diachronic study and language change (10.7)
- language learning and teaching (10.8)

Other areas which have started to use corpus data
- Semantics (10.9)
- Pragmatics (10.10)
- Sociolinguistics (10.11)
- Discourse analysis (10.12)
- Stylistics and literary studies (10.13)
- Forensic linguistics (10.14)