**1.
Introduction****
**

Sociolinguistics studies in Japan can be classified into main two groups, "analysis of variation" and "analysis of discourse". "Variation" and "discourse" are not opposed ideas in their meaning, but seem to be opposed to each other in their contexts; while studies of variety rely on quantitative approach in their main analysis, the discourse studies utilize qualitative approach. Besides, some fields deal with the langugage itself. (e.g., the study of the ebb and flow of language, the language policy) This article intends to scan the process from counting data to multivariate analysis in the studies of variation, especially focuses on summarizing the data.

Any raw data can be invariable for social researchers, for it is impossible for them to collect all of thier research data which cover all regions and times as they want by thier own efforts. However organizing the data is dispensable for varification of assumption, discovery of the patterns, and the detection of unknown factors out of the data surveyed. Especially in the quantitative analysis, organizing process is important in its numerical process. In Oikawa(1999), the significance of adoption of numerical process in the quantitative process are :

- simplify the
explanation of things and phenomenon
- estimate and discover
the patterns

which are, that is, summarizing the data. The process of quantitative
analysis seems objective, but it is true only in the process of statistical
calculation. Therefore the data form must be standardized prior to quantitative
analysis. In the case of althernative question, quantification is easy but calls
for much care while making the alternatives. In the case of open question, some
criteria need to be set for categorizing the answers. The basic ways of
summarizing data are, for example, simple statistics, cross tabulation.
Conducting only summation, mean and variance of the data by generations,
regions and other attributions can verify a variety of implications and aspects
of the data.

**3.
Summarizing by Multivariate Analysis**

However, it is hard to grasp and understand the tendency of the group surveyed
with the basic analysis when the raw data contains a lot of items and
questions. For example, Aoki( his Web site )
points out that the basic analysis with the cross tabulations imposes much
difficulty on researchers and it has an unexpected pitfall in case that the
analysis deals with more than two variables. Multivariate analysis is an
alternative statistical method for summarizing a complicated state which human
thought can not trace. It is a statistical technique to abstract the typical
tendency out of large quantities of data. Though its calculation process is
intricate, multivariate analysis has popularized as computers developed.

The major methods of multivariate analysis are as belows:

**multiple linear regression analysis**

(estimate the other variable based on some fixed variables)**discriminant analysis**

(determine to which group each data belongs with some fixed variables)**principle component analysis**

(summerize information of multivariate data into a small number of synthetical specific values)

**factor analysis**

(estimate the common potential factors out of multivariate data)

For Yes-No Data, which is popular for questionnaires, Hayashi's
Quantification Theory is popular in Japan. Each Hayashi's theory I, II, III
corresponds to (1),(2),(3) analysis above. Many guidebooks on multivariate
analysis, such as one written by Ishimura(1992), are available. On the
Internet, Prof. Aoki(Gumma University)'s Web site has solid articles about
statistics (http://aoki2.si.gunma-u.ac.jp/).
This Web site provides self-learning tools and calculating service which users
can calculate statistical work through the CGI Form. These are very useful.

**4.
Examples of summerizing data**

Kasai(1981) considered how the standard language forms are distributed by
counting the numbers of respondants who use standard forms by every prefecture
with "the Linguistic Atlas of Japan (LAJ)".

The data in LAJ can not be used as it is because the answers of the respondants
in it varies by region by the questions.

Therefore she set the following criteria to arrange them quantitatively:

- limit to standard forms
- put toghther similar
pronunciations to standard forms
- compile areas by
prefecture

Then she transformed the LAJ data into the quantitative data (see table
here)

At this stage the data is considerably summerized in comparison with the
original one. With this data, the distribution of the standard forms can be
analyzed roughly. The following is the table of the average ratio of
respondants who use the standard forms in 82 questions by prefecture.

It shows that the standard forms are based on Kanto and Kansai Japanese.
However, as mentioned above, it is difficult to recognize the relations of them
only by looking at the data of 82 questions, Then Inoue & Kasai(1982)
developed the analysis by the multivariate analysis to see the factors behind
the data. In this paper, the matrix of 82 words x 48 districts(47 prefectures
and 1 insular part in Tokyo) was analyzed by factor analysis and the standard
forms are classified into some categories by regions and words. (see fig. here)

Kasai's data uses only one part of LAJ, but it became suitable for multivariate
analysis due to simple digitalization with fixed criteria. It can summarize
complicated data of LAJ from the viewpoint of standard forms.

**5.
Conclusion**

The data surveyed enables us to conduct more thoughtful analysis by summarizing
it quantitatively. However, the uncritical attitude to quantitative analysis
has a great risk. The main reasons are the following:

- The numerical data for
the calculation can derive from the experience and knowledge of the
researchers.

(The numeric value does not exist from the beginning) - The result of the
quantitative analysis has, in itself, no meaning for the academic
discipline

(The numeric value tells nothing; consideration, interpretation and explanation are dispensable)

Thus, analyzing without
understanding the problems of statistics may cause mistakes. Especially in
regard to multivariate analysis, most of researchers in humanities course use
it without understanding the process of the statistical calculation due to its
complex computation expression formula. Thus, it is important to understand the
principle of the simple statistical analysis to some extent. Even if it is
difficult for learners to understand them completely, still it is important to
grasp the images of the principle of them with some guidebooks.

