INTRODUCTION TO BIOSTATISTICS
 In the information age. this blog is about information :—
 how it is obtained,
 how it is analyzed,
 how it is interpreted.
 To begin with, raw information that is of concern to us is labelled as data.
 Data is available to us in the form of numbers.
 These numbers need to be processed to yield insights we call refined data or information.
 The objectives of this blog are:
 To teach the reader the skill of organising and summarizing data.
 Descriptive statistics.
 To enable the reader in reaching decisions about any amount of data by analysing only a part of it.
 Inferential statistics.
 To teach the reader the skill of organising and summarizing data.
Biostatistics – Concepts and terminology
 It is easy to get confused in statistics as it has its own vocabulary. Even familiar terms have a different meaning in statistics as compared to its day to day usage.
 Hence an initial familiarization with statistical terms becomes mandatory
IMPORTANT TERMS
Data
 It is the raw material in the world of statistics.
 Data is understood to be things known or assumed as facts, which provide the basis of reasoning or calculation.
 The two kinds of numbers that we use in statistics are numbers
 that result from the taking of any measurement,
 and those that result from the process of keeping a count.
 Each of the recorded numbers is a datum(singular).
 All records taken together are data (plural).
Statistics
 It is the practice or science of collection and analysis of numerical data in significant numbers to infer the proportions of a whole from which the representative sample is taken.
 Hence, statistics is a field of study concerned with
 Data collection, organization, summarization, and analysis
 Drawing inferences about a body of data from a representative sample of data.
 Simply put,
 data are recorded numbers,
 these numbers contain information,
 statistics investigate and evaluate recorded data to reveal this information and its meaning in the context of the study population.
Trends of Data
 Data Volume Creation and Consumption in the Future (IDC & Statista, 2020)
 The year 2022 – 94 zettabytes
 The year 2023 – 118 zettabytes
 The year 2024 – 149 zettabytes
 (1 zettabyte = 1000,000,000,000 GB)
 As of 2020, the average data consumption per user per month across 3G and 4G networks in India were 13462 megabytes. (Statista)
 Trends for big data generation are
 In comparison, conventional sources of data for medical use have been restricted to
 Routinely organisational records.
 Patient medical records – These are becoming increasingly important with the advent of digital electronic health records (EHR).
 Accounting records – More often than not, these are the bestkept records to be found.
 Survey records – for an objective assessment of subjective opinions and/or practices.
 Experimental observations – These are specifically important when the required data is peculiar to a given situation.
 External sources – These include:
 Published reports,
 Commercially available data banks,
 Research literature
Biostatistics
 Simply put, it is the application of statistical tools and concepts in the field of biological sciences.
 Here the data are derived from biological sciences like medicine.
Variable
 a characteristic under observation that adopts different values under different circumstances
 Quantitative Variables
 Measurements made on quantitative variables convey information regarding the amount.
 Qualitative Variables
 Measurements made on qualitative variables convey information regarding
attribute or frequencies of counts.
 Measurements made on qualitative variables convey information regarding
 Quantitative Variables
Random Variable
 Here the values obtained for a variable are a result of chance because of which they cannot be exactly predicted in advance.
 Types
 Discrete Random Variable
 It is characterized by gaps or interruptions in the values which the variable can assume.
 Continuous Random Variable
 It does not possess any gaps or interruptions characteristic of a discrete random variable.
 Discrete Random Variable
 However, consequent to the limitations of measuring instruments, observations about variables that are inherently continuous are recorded as discrete.
Population
 defined as the largest collection of entities of interest at a particular time.
 Alternatively, defined as the largest collection of values of a random variable of interest at a given time.
 Populations are determined by a subjective sphere of interest
 Populations may either be finite or infinite.
Sample
 defined simply as a part of a population.
 It may be
 Representative sample
 NonRepresentative sample
Research study
 It is a scientific study of a phenomenon of interest.
 Research studies entail
 designing sampling protocols,
 collecting and analyzing data,
 and providing valid conclusions based on the results of the analyses.
Experiments
 These are a special type of research study
 The observations are made after specific manipulations of conditions have been carried out
 They provide the foundation for scientific research.
MEASUREMENT AND MEASUREMENT SCALES
Measurement
 Defined as a system of
assignment of numbers
to objects or events
using a predetermined set of rules.
Types of measurement scales
The Nominal Scale
 This is considered the simplest and least scalable form of a measurement scale
 It entails “naming” observations
 It works by organising measurements into mutually exclusive and collectively exhaustive categories.
The Ordinal Scale
 Here in addition to recording observations in different categories, they are ranked according to predetermined criteria.
The Interval Scale

It is considered to be a more sophisticated scale than the nominal or ordinal scale.

Here the distance between any two measurements is known

this requires the use of an arbitrarily determined
 unit distance
 zero point

This means that the selected zero point is not necessarily a true zero (total absence of the quantity being measured).
The Ratio Scale
 It is widely considered to be the highest level of any measurement scale.
 Fundamental prerequisite for a ratio scale is a true zero point.
 This scale is characterized by the fact that it can determine
 equality of ratios
 equality of intervals
SAMPLING AND STATISTICAL INFERENCE
Statistical inference
: It is the procedure by which we reach a conclusion about a population on the basis of the information drawn from a sample that has been drawn from that population.
 To make a valid inference about a population, scientific samples need to be drawn from the population.
 There are also many kinds of scientific samples – the simplest of these is the simple random sample.
Simple random sample :
If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected, the sample is called a simple random sample.  To ensure true randomness of the selection, some objective procedure needs to be followed.

Types of procedures for simple random sampling:
 Sample with replacement: Here every member of the population is available at
each draw.  Sample without replacement : As the sampled members are removed from the population for subsequent sampling, observations could be recorded from them only once.
 Sample with replacement: Here every member of the population is available at

In practice, sampling is always done without replacement.
Leave a Reply