CEFR-J RLD

[ CEFR-J Members | CEFR-J RLD ]

CEFR-J Reference Level Descriptions (RLDs) †

↑

What is RLD? †

CEFR RLD page by the Council of Europe

↑

Corpora used for our RLD work †

↑

Textbook Corpus as INPUT †

CEFR Course Book Corpus:
- internal resource
- 96 CEFR-based course books published in the UK, with CEFR level classifications
- 1,801,549 running words

English Textbook Corpus in Japan
- internal resource
- 7 junior high textbooks and 21 senior high school textbooks (Grades 1-3 each)
- 1,158,525 running words

↑

Learner Corpus as OUTPUT †

Spoken:
- NICT JLE Corpus (CEFR-aligned version) [ website ]
  - A corpus of English oral interview tests called Standard Speaking Test by ALC Press
  - 1,281 examinees' interview transcripts with SST levels
  - We created a CEFR-aligned version of the NICT JLE Corpus.
  - 763,289 words (only interviewees' utterances)

Written:
- JEFLL Corpus (CEFR-level classified version) [ Information ]
  - A corpus of free compositions by Japanese junior and senior high school students.
  - 10,038 samples
  - 669,281 running words

↑

CEFR-J Wordlist †

↑

About: †

CEFR-J Wordlist Version 1.6
- A list of 7,801 items classified by the CEFR (A1 to B2) levels.
- Each item has the following information:
  - headword (lemma form)
  - part of speech
  - CEFR level
  - thematic categories defined by the British Council/EAQUALS Core Inventory for General English and Threshold Levels 1990 (Council of Europe)

↑

Download †

Version 1.6
- [ download ]

↑

How to cite: †

Tono, Y. (2017). The CEFR-J and its Impact on English Language Teaching in Japan. JACET International Convention Selected Papers, Volume 4, pp. 31-52. JACET.

↑

CEFR-J Collocation Dataset †

↑

About †

First release (September, 2022)
Collocation list based on the CEFR-J Wordlist Ver. 1.6
Syntactic frame-based collocation pairs extracted from BNC (dependency-parsed by stanza)

↑

Dataset information: †

Each collocation pair has the following information:
- w1: collocate
- w2: node
- w1_CEFR: CEFR level of w1
- w2_CEFR: CEFR level of w2
- relation: dependency relation
- cooccurrence: collocation frequency
- freq_w1: independent frequency of w1in the entire BNC
- freq_w2: independent frequency of w2 in the entire BNC
- w1_in_rel: frequency of w1 in the given dependency relation
- w2_in_rel: frequency of w2 in the given dependency relation
- DP: dispersion measure DP (Gries)
- expected_freq: expected frequencies
- Association measures for this given collocation pair:
  - MI/ MI2/ MI3/ t_score/ z_score/ logDice/ log_likelihood/ chi_squared

↑

Download †

ADJ+NOUN (amod): 135,939 pairs [ download ]

VERB+NOUN (obj): 114,582 pairs [ download ]

NOUN+NOUN (nounmod): 72,340 pairs [ download ]

ADVERB+VERB (advmod verb): 43,992 pairs [ download ]

ADVERB+ADJ (advmod adj): 16,180 pairs [ download ]

↑

Acknowledgement †

This dataset was created by Kohei Fukuda, a postgraduate student in my lab.

↑

How to cite: †

Fukuda, K. & Tono, Y. (2022). The CEFR-J Collocation Dataset Version 1.0. Tono Lab, TUFS. (this URL)

↑

CEFR-J Grammar Profile †

↑

About †

An inventory of grammar items classified by CEFR levels
Profiling was based on INPUT (ELT Course Book Corpus) as well as OUTPUT (Spoken and Written Learner Corpus)

↑

A list of grammar items and their REGEX queries †

The following Excel file describes 263 grammar items investigated and their REGEX query

[ download ]

↑

Grammar Profile for Teachers and Learners †

A user-friendly version of the Grammar Profile
Visual display showing CEFR levels where particular grammar items are introduced based on the distributions of grammar items across position-based CEFR course books.

[ download ]

↑

Original dataset †

Frequencies of 263 grammar items are obtained from the following corpora. Corpora themselves cannot be redistributed due to copyright restrictions, but the frequency data from each text will be made publicly available.

CEFR-based ELT Course Books: Frequency of 263 grammar items in CEFR-classified course books
- [ download ]

CEFR-based ELT Course Books (Position-based): Frequency of 263 grammar items in the course books divided by two or three parts in order to examine the detailed occurrences in CEFR sub-levels.
- [ download ]

CEFR-based ELT Course Books (Skill-based): Frequency of 263 grammar items in the course books divided by sections focusing on 4 skills (listening/reading/speaking/writing).
- [ download ]

Written Learner Corpus: Frequency of 263 grammar items in the JEFLL Corpus, a corpus of 10,000 Japanese EFL learners' 20-minute in-class free compositions. Frequencies were obtained in both the original student writings and the versions corrected by native speakers.
- [ download ]

Spoken Learner Corpus: Frequency of 263 grammar items in NICT JLE Corpus, a corpus of oral interviews by Japanese EFL learners (1,281 samples)
- [ download ]

↑

English Level Checker †

About
- A tool developed by the Okumura Lab at Tokyo Institute of Technology.
- The site will provide a list of grammar items found in the text you input along with other lexical measures and the final CEFR level judgement.
- It has been trained by both textbook and essay data. In the case of essay data, the input text can be automatically spotted for errors and suggestions will be made.

Access [ English Level Checker ]

↑

How to cite: †

Ishii, Y. & Tono, Y. (2018). Investigating Japanese EFL learners' overuse/underuse of English grammar categories and their relevance to CEFR levels. Proceedings of the 4th Asia Pacific Corpus Linguistics Conference, (Edited by Y. Tono and H. Isahara), pp. 160-165.

↑

CEFR-J Text Profile †

↑

About: †

Text Profile is a list of textual characteristics and their values obtained from the analysis of CEFR-classified texts
The CEFR-J Text Profile was mainly constructed by our project member, Dr Satoru Uchida (Kyushu University), the team of Yuki Arase Lab at Osaka University and Sachio Hirokawa Lab at Kyushu University.

↑

Text profile measures †

Common measures:
- word length (1 to 3 letters)
- word length (4 to 6 letters)
- word length (7 letters +)
- average word length
- types
- TTR
- mean length of sentences

Lexical profile measures:
- Average difficulty
- A1_per
- A2_per
- B1_per
- B2_per
- C1_per
- C2_per

Complexity measures:
- sum_D_score
- avg_D_score
- sum_L_score
- avg_L_score
- avg_MaxDepth?

D_score: depth x difficulty level
L_score: depth x word length

Grammatical measures:
- avg_[G-item]
- [G-item]_per

↑

Dataset †

Text profile metrics and their values for CEFR Course Book Corpus (All / Skill-based / Position-based)

[ download (PDF) ]

[ download (Excel) ]

↑

CVLA †

About
- CEFR-based Vocabulary Level Analyzer by Satoru Uchida at Kyushu University
- A tool to report the CEFR levels of vocabulary used in the input text along with other text profile measures and the estimated CEFR level.

Access
- [ CVLA ]

↑

How to cite: †

Uchida, Satoru and Masashi Negishi (2018) Assigning CEFR-J levels to English texts based on textual features. In Y. Tono and H. Isahara (eds.) Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC 2018), pp. 463-467. PDF
Uchida, S. (2015). A CEFR-based Textbook Corpus: An attempt to reveal linguistic features of CEFR levels (original in Japanese). English Corpus Studies, 22, 87-99.

↑

Before CEFR †

Threshold Level Series ("T-series")
- You can access the original T-series books from here

最新の20件

CEFR-J Reference Level Descriptions (RLDs) †

What is RLD? †

Corpora used for our RLD work †

Textbook Corpus as INPUT †

Learner Corpus as OUTPUT †

CEFR-J Wordlist †

About: †

Download †

How to cite: †

CEFR-J Collocation Dataset †

About †

Dataset information: †

Download †

Acknowledgement †

How to cite: †

CEFR-J Grammar Profile †

About †

A list of grammar items and their REGEX queries †

Grammar Profile for Teachers and Learners †

Original dataset †

English Level Checker †

How to cite: †

CEFR-J Text Profile †

About: †

Text profile measures †

Dataset †

CVLA †

How to cite: †

Major RLD projects for English †

British Council/EAQUALS Core Inventory for General English †

English Profile: †

Global Scale of English by Pearson †

Before CEFR †