Statistics:
Statistics is that science
which enables to draw conclusions about various phenomena one the basis of real
data collected on sample basis. OR Statistics is a science of facts and
figures.
INFERENTIAL STATISTICS: That
branch of Statistics which enables us to draw conclusions or inferences about
various phenomena on the basis of real data collected on sample basis.
Data: A well defined
collection of objects is known as data.
Qualitative data: Data that
are labels or names used to identify an attribute of each element. Qualitative
data may be nonnumeric or numeric.
Qualitative variable: A
variable with qualitative data.
Quantitative data: Data that
indicate how much or how many of something. Quantitative data are always
numeric.
Variable: A measurable
quantity which can vary from one individual or object to another is called a
variable.
Nominal Scale: The
classification or grouping of observations into mutually exclusive qualitative
categories is said to constitute a nominal scale e.g. students are classified
as male and female.
Ordinal Scale: It includes the
characteristic of a nominal scale and in addition has the property of ordering
or ranking of measurements e.g. the performance of students can be rated as
excellent, good or poor.
Interval Scale: A measurement
scale possessing a constant interval size but not true zero point is called an
Interval Scale.
Ratio Scale: It is a special
kind of an interval scale in which the scale of measurement has a true zero
point as its origin.
Biased Errors: An error is
said to be biased when the observed value is consistently and constantly higher
or lower than the true value.
Unbiased Errors or Random
Errors: An error, on the other hand, is said to be unbiased when the
deviations, i.e. the excesses and defects, from the true value tend to occur
equally often.
Primary Data: The data
published or used by an organization which originally collected them are called
primary data thus the primary data are the first hand information collected,
complied, and published by an organization for a certain purpose.
Secondary Data: The data
published or used by an organization other than the one which originally
collected them are known as secondary data.
DIRECT PERSONAL INVESTIGATION:
In this method, an investigator collects the information personally from the
individuals concerned. Since he interviews the informants himself, the
information collected is generally considered quite accurate and complete.
INDIRECT INVESTIGATION:
Sometimes the direct sources do not exist or the informants hesitate to respond
for some reason or other. In such a case, third parties or witnesses having
information are interviewed.
Population: The collection of
all individuals, items or data under consideration in statistical study is
called Population.
Sample: A sample is a group of
units selected from a larger group (the population). By studying the sample it
is hoped to draw valid conclusions about the larger group. OR Sample is that
part of the Population from which information is collected.
SAMPLING FRAME: A sampling
frame is a complete list of all the elements in the population. Sampling Error:
The sampling error is the difference between the sample statistic and the
population parameter.
Non-Sampling Error: Such
errors which are not attributable to sampling but arise in the process of data
collection even if a complete count is carried out.
Sampling with replacement:
Once an element has been included in the sample, it is returned to the
population. A previously selected element can be selected again and therefore
may appear in the sample more than once.
Sampling without replacement:
Once an element has been included in the sample, it is removed from the
population and cannot be selected a second time.
Standard error: The standard
deviation of a point estimator. AND The degree of scatter of the observed
values about the regression line measured by what is called standard deviation
of regression or standard error of estimate.
Sampling Unit: The units
selected for sampling. A sampling unit may include several elements.
NON-RANDOM SAMPLING: Nonrandom
sampling’ implies that kind of sampling in which the population units are drawn
into the sample by using one’s personal judgment. This type of sampling is also
known as purposive sampling.
Quota Sampling: Quota sampling
is a method of sampling widely used in opinion polling and market research.
Interviewers are each given a quota of subjects of specified type to attempt to
recruit for example, an interviewer might be told to go out and select 20 adult
men and 20 adult women, 10 teenage girls and 10 teenage boys so that they could
interview them about their television viewing.
RANDOM SAMPLING: The theory of
statistical sampling rests on the assumption that the selection of the sample
units has been carried out in a random manner. By random sampling we mean
sampling that has been done by adopting the lottery method.
Simple random sampling: Finite
population: a sample selected such that each possible sample of size n has the
same probability of being selected.
Infinite population: a sample
selected such that each element comes from the same population and the elements
are selected independently.
Pie Chart: Pie Chart consists
of a circle which is divided into two or more mars in accordance with the
number of distinct classes that we have in our data.
SIMPLE BAR CHART: A simple bar
chart consists of horizontal or vertical bars of equal width and lengths
proportional to values they represent.
MULTIPLE BAR CHARTS: This kind
of a chart consists of a set of grouped bars, the lengths of which are
proportionate to the values of our variables, and each of which is shaded or
colored differently in order to aid identification.
CLASS BOUNDARIES: The true
class limits of a class are known as its class boundaries.
HISTOGRAM: A histogram
consists of a set of adjacent rectangles whose bases are marked off by class
boundaries along the X-axis, and whose heights are proportional to the
frequencies associated with the respective classes.
FREQUENCY POLYGON: A frequency
polygon is obtained by plotting the class frequencies against the mid-points of
the classes, and connecting the points so obtained by straight line segments.
FREQUENCY CURVE: When the frequency
polygon is smoothed, we obtain what may be called the frequency curve.
CUMULATIVE FREQUENCY
DISTRIBUTION: As in the case of the frequency distribution of a discrete
variable, if we start adding the frequencies of our frequency table
column-wise, we obtain the column of cumulative frequencies.
AVERAGES (I.E. MEASURES OF
CENTRAL TENDENCY): A single value which intended to represent a distribution or
a set of data as a whole is called an average. It is more or less a central
value around which the observations tend to cluster so it is called measure of
central tendency. Since measure of central tendency indicate the location of
the distribution on X axis so it is also called measure of location.
The Arithmetic, Geometric and
Harmonic means Are averages that are mathematical in character, and give an
indication of the magnitude of the observed values.
The Median Indicates the
middle position while the mode provides information about the most frequent
value in the distribution or the set of data.
THE MODE: The Mode is defined
as that value which occurs most frequently in a set of data i.e. it indicates
the most common result.
DOT PLOT: The horizontal axis
of a dot plot contains a scale for the quantitative variable that we want to
represent. The numerical value of each measurement in the data set is located
on the horizontal scale by a dot.
GROUPING ERROR: “Grouping
error” refers to the error that is introduced by the assumption that all the
values falling in a class are equal to the mid-point of the class interval.
Ogive: A graph of a cumulative
distribution.
Dispersion: The variability
that exists between data set.
Range: The range is defined as
the difference between the maximum and minimum values of a data set.
The coefficient of variation:
The coefficient of variation expresses the standard deviation as the percentage
of the arithmetic mean.
Quartiles: Quartiles are those
three quantities that divide the distribution into four equal parts.
Quartile Deviation: The
quartile deviation is defined as half of the difference between the first and
third quartiles.
Quantiles: Collectively the
quartiles, the deciles, percentiles and other values obtained by equall
sub-division of the data are called quantiles.
Percentiles: Percentiles are
those ninety nine quantities that divide the distribution into hundred equall
parts
Absolute measure of
dispersion: An absolute measure of dispersion is one that measures the
dispersion in terms of the same units or in the square of units, as the units
of the data.
Relative measure of
dispersion: Relative measure of dispersion is one that is expressed in the form
of a ratio, co-efficient of percentage and is independent of the units of
measurement.
COEFFICIENT OF QUARTILE
DEVIATION: The Coefficient of Quartile Deviation is a pure number and is used
for COMPARING the variation in two or more sets of data.
Mean Deviation: The mean
deviation is defined as the arithmetic mean of the deviations measured either
from the mean or from the median, all deviations being counted as positive.
Variance: Variance is defined
as the square of the standard deviation.
Standard Deviation: Standard
Deviation is defined as the positive square root of the mean of the squared
deviations of the values from their mean.
Five-number summary: An
exploratory data analysis technique that uses the following five numbers to
summarize the data set: smallest value, first quartile, median, third quartile,
and largest value.
Moments: Moments are the
arithmetic means of the powers to which the deviations are raised.
Correlation: Correlation is a
measure of the strength or the degree of relationship between two random
variables. OR Interdependence of two variables is called correlation. OR
Correlation is a technique which measures the strength of association between
two variables.
RANDOM EXPERIMENT: An
experiment which produces different results even though it is repeated a large
number of times under essentially similar conditions is called a Random
Experiment.
SAMPLE SPACE: A set consisting
of all possible outcomes that can result from a random experiment (real or
conceptual), can be defined as the sample space for the experiment and is
denoted by the letter S. Each possible outcome is a member of the sample space,
and is called a sample point in that space.
Really nice blog post.provided a helpful information.I hope that you will post more updates like this Data Science online Training
ReplyDeleteThanks
Delete