What is Data?
Data is numerical information that represents measurements from the real world.
Datum is a single measurement, while data refers to a collection of such measurements.
There is a vast amount of data available today, but deriving logical conclusions from raw data can be challenging.
To ensure usefulness of data, it should be algorithmically derived, logically deduced, or statistically calculated from multiple data points.
Information is defined as a meaningful answer to a query or a meaningful stimulus that can lead to further queries.
======
Need of Data
Maps and tables are essential tools in geography to study phenomena and their growth.
Many variables influencing interactions over the earth’s surface can be better understood in quantitative terms.
Statistical analysis of these variables is now necessary for in-depth geographical analysis.
For instance, studying the cropping pattern of an area requires statistical information about cropped area, crop yield, production, irrigated area, rainfall, and inputs like fertilizers, insecticides, and pesticides.
Similarly, understanding the growth of a city necessitates data related to total population, density, number of migrants, occupations, salaries, industries, transportation, and communication.
Therefore, data has a crucial role in geographical analysis.
======
Presentation of the Data
The story of the person, his wife, and child drowning while crossing a river illustrates the concept of statistical fallacy.
Data presentation is crucial as it helps in understanding the facts and figures, and avoiding fallacies.
Statistical methods are significant in data analysis, presentation, and drawing conclusions across various disciplines, including geography.
The concentration of phenomena like population, forest, or transportation networks can be explained quantitatively over space and time.
Precise quantitative techniques are essential for collecting, compiling, organizing, ordering, and analyzing data to derive accurate conclusions.
======
Sources of Data
Data can be collected from two main sources: Primary and Secondary.
Primary sources involve data collected for the first time by individuals, groups, or institutions.
Secondary sources consist of data obtained from already published or unpublished sources.
The methods of data collection are depicted in Fig. 1.1.
Primary sources include surveys, interviews, and observations.
Secondary sources include books, articles, and reports.
======
Sources of Primary Data
Primary data is the information gathered for the first time and related to specific objectives of the research.
There are two main sources of primary data: observations and surveys.
Observations involve watching and recording the events as they occur in their natural setting.
Surveys involve collecting data through interviews, questionnaires, or schedules from the respondents.
Other sources of primary data include experiments, case studies, and focus groups.
The choice of primary data collection method depends on the research objectives, time, cost, and skills available.
======
1. Personal Observations
Personal observations refer to the collection of information through direct observations in the field.
This method involves gathering data on relief features, drainage patterns, soil types, natural vegetation, population structure, sex ratio, literacy, means of transport and communication, urban and rural settlements, etc.
A field survey is often conducted to carry out personal observations.
The person(s) involved must have theoretical knowledge of the subject and a scientific attitude for unbiased evaluation.
Equipment and scientific techniques may be used to enhance the accuracy of the observations.
======
2. Interview
The interview method involves gathering direct information through dialogues and conversations.
Certain precautions to be taken: a precise list of items to gather information, clarity about the survey’s objective, building trust with respondents, creating a congenial atmosphere, using simple and polite language, avoiding hurtful questions, seeking additional information, and expressing gratitude.
Preparation: understanding the objective of the survey and preparing a list of items to gather information.
Interaction: building trust with respondents, using simple and polite language, and avoiding hurtful questions.
Information gathering: seeking additional information and expressing gratitude.
======
3. Guestionnaire/Schedule
The questionnaire method involves simple questions with tick-marked possible answers or structured questions for respondents’ opinions.
The objectives of the survey should be clearly stated in the questionnaire.
This method is useful for larger areas and can be mailed to distant places, but only literate and educated people can be approached.
Similar to the questionnaire is the schedule, which contains questions about the investigation matter.
The difference is that the enumerator fills up the schedule by asking questions, allowing information collection from both literate and illiterate respondents.
======
4. Other Methods
Soil and water properties are measured in the field using a soil kit and water quality kit.
Data about crop and vegetation health is collected using transducers.
Field scientists are responsible for collecting this data.
Measurements of soil, water, crops, and vegetation are all part of assessing the overall health of the environment.
Figure 1.2 likely provides a visual representation of this data collection process.
======
Secondary Source of Data
Secondary sources of data are derived from published and unpublished records.
They include government publications, documents, and reports.
These sources are not primary, as they are not firsthand accounts or original research.
Secondary sources can still provide valuable information, especially when they synthesize or analyze primary source data.
They are important for researchers, as they can offer context, background, and alternative perspectives.
======
Published Sources
Government Publications: Important sources of secondary information from the Government of India, state governments, and District Bulletins. Examples include Census of India, National Sample Survey, Weather Reports, and Statistical Abstracts.
Semi/Quasi-government Publications: Publications and reports of Urban Development Authorities, Municipal Corporations, Zila Parishads, etc.
International Publications: Yearbooks, reports, and monographs published by United Nations agencies such as UNESCO, UNDP, WHO, FAO. Examples include Demographic Year Book, Statistical Year Book, and Human Development Report.
Private Publications: Yearbooks, surveys, research reports, and monographs published by newspapers and private organizations.
Newspapers and Magazines: Easily accessible sources of secondary data.
Electronic Media: The internet is a major source of secondary data.
======
Unpublished Sources
Unpublished sources of secondary data can be found in government documents, which are prepared and maintained at various levels of governance. These can include revenue records at the village level.
Quasi-government records include periodical reports and development plans prepared by Municipal Corporations, District Councils, and Civil Services departments.
Private documents serve as a source of unpublished data, and can include reports and records from companies, trade unions, political and non-political organizations, and residents’ welfare associations.
Government documents can provide important village-level information, as seen with revenue records at the village level.
Quasi-government records can include a variety of reports and plans, while private documents can come from a range of sources such as companies and political organizations.
======
Tabulation and Classification of Data
Data collected from primary or secondary sources is initially in a raw, unorganized form.
Tabulation and classification are required to make the data useful and draw meaningful inferences.
A statistical table is a simple device for summarizing and presenting data in columns and rows.
The purpose of statistical tables is to simplify presentation and facilitate comparisons, enabling readers to locate information quickly.
These tables allow analysts to present large amounts of data in an orderly manner within minimal space.
======
Data Compilation and Presentation
The text provides data on the Index of Industrial Production (IIP) for India for the years 1970-71 to 2000-01.
The IIP for the year 1970-71 was 32.5.
The IIP for the year 1980-81 was 42.2.
The IIP for the year 1990-91 was 53.7.
The IIP for the year 2000-01 was 67.4.
The IIP for the year 2000-01 is 207% of the IIP for the year 1970-71.
======
Processing of Data
The text provided is a sequence of numbers.
There are no clear equations, formulae, or concepts in the text.
The absence of explanations or examples suggests that the text may be a continuation of a previously presented concept or exercise.
As a standalone text, it is difficult to summarize or extract meaningful information.
In the context of sequences or progressions, an analysis of patterns, arithmetic or geometric progression, or other mathematical concepts may be applicable, but more information is required.
======
Grouping of Data
The text presents a list of numbers and corresponding equations or formulas to be applied to them.
For the first set of numbers (4), the formula is $7x+1$.
For the second set of numbers (5), the formula is $74x$.
For the third set of numbers (7), the formula is $7+4+11$.
For the fourth set of numbers (6), the formula is $7+x+1$.
For the fifth set of numbers (10), the formula is $74x+111$.
For the sixth set of numbers (8), the formula is $7+x+7x+…$.
The last line shows the summation notation for the entire set of numbers, with N representing the total number of terms, which is 60.
======
Process of Classification
The process of classification involves determining the number of groups and the class interval for each group.
Raw data is then classified using a method such as the Four and Cross Method or tally marks.
In this method, one tally mark is assigned to each individual in the group to which they belong.
For example, the first numerical value in the raw data (47) falls in the group of 40-50, so one tally mark is recorded in column 3 of Table 1.5.
The tally marks are used to visually represent the data and make it easier to understand.
======
Frequency Distribution
Frequency distribution is a way of classifying and grouping raw data of a quantitative variable.
It illustrates how different values of a variable are distributed in different classes, as shown in Table 1.5.
The number of individuals in each class is referred to as frequency.
Frequencies can be classified as simple and cumulative frequencies.
Simple frequencies represent the number of individuals in each class, while cumulative frequencies represent the total number of individuals up to and including that class.
======
Simple Frequencies
The text presents a summary of statistical data.
There are 10 sets of data, each with a different number of elements.
The range of elements in each set varies from 4 to 10.
The sum of the frequency (f) and the total number of elements (N) is 60.
The formula for the sum of the frequency and total number of elements is represented as ∑f=N=60.
======
Cumulative Frequencies
Cumulative frequency is expressed as $\boldsymbol{C} extbf{f}$ and can be obtained by adding successive simple frequencies in each group with the previous sum.
It is beneficial in understanding the number of individuals scoring less than a certain value or the number of individuals lying below a certain score.
The last cumulative frequency is equal to $\mathrm{N}$ or $\sum f$.
Each simple frequency is associated with its group or class, formed using exclusive or inclusive methods.
For example, in Table 1.6, the first simple frequency is 4, and adding the next frequency of 5 gives a total of 9, which is the next cumulative frequency. This process is repeated until the last cumulative frequency of 60 is obtained.
======
Exclusive Method
The Exclusive Method is a grouping method where the upper limit of one group is the same as the lower limit of the next group, but any observation having the same value as the upper limit is included in the lower limit group and excluded from the upper limit group.
In this method, the groups are interpreted as follows: 0 and under 10, 10 and under 20, 20 and under 30, and so on.
Each group extends over ten units, for example, the group 20 and under 30 includes the numbers 20, 21, 22, 23, 24, 25, 26, 27, 28, and 29.
This method is called exclusive because a group is excluded of its upper limits.
The marginal values of Table 1.4 will be placed according to this exclusive method of grouping.
======
Inclusive Method
The text presents statistical data divided into nine ranges.
There are 60 data points in total, as represented by the equation N=60.
The range with the most data points (10) is 50-59.
The range with the least data points (4) is 90-99 and 60-69.
No equations or formulae are provided in the text.
======
Frequency Polygon
A frequency polygon is a graph of a frequency distribution.
It is used to compare two or more than two frequency distributions.
The frequencies are represented using a bar diagram and a line graph.
The line graph connects the midpoints of the tops of the bars in the bar diagram.
Frequency polygons are useful in comparing different data sets and their distributions.
======
Ogive
The text presents the frequency distribution of data in different intervals and their cumulative frequency distributions.
There are two types of cumulative frequency distributions: more than and less than.
The formula for the upper bound of the class interval in the more than cumulative frequency distribution is the lower limit of the class interval plus the class size.
The formula for the upper bound of the class interval in the less than cumulative frequency distribution is the lower limit of the class interval.
The more than and less than cumulative frequency distributions are used to create more than and less than ogives, which are used to solve problems related to probability and statistics.
======