Biostatistics – An Introduction




  • In the information age. this blog is about information :—
    • how it is obtained,
    • how it is analyzed,
    • how it is interpreted.
  • To begin with, raw information that is of concern to us is labelled as data.
  • Data is available to us in the form of numbers.
  • These numbers need to be processed to yield insights we call refined data or information.
  • The objectives of this blog are:
    1. To teach the reader the skill of organising and summarizing data.
      • Descriptive statistics.
    2. To enable the reader in reaching decisions about any amount of data by analysing only a part of it.
      • Inferential statistics.

Biostatistics – Concepts and terminology

  • It is easy to get confused in statistics as it has its own vocabulary. Even familiar terms have a different meaning in statistics as compared to its day to day usage.
  • Hence an initial familiarization with statistical terms becomes mandatory



  • It is the raw material in the world of statistics.
  • Data is understood to be things known or assumed as facts, which provide the basis of reasoning or calculation.
  • The two kinds of numbers that we use in statistics are numbers
    1. that result from the taking of any measurement,
    2. and those that result from the process of keeping a count.
  • Each of the recorded numbers is a datum(singular).
  • All records taken together are data (plural).


  • It is the practice or science of collection and analysis of numerical data in significant numbers to infer the proportions of a whole from which the representative sample is taken.
  • Hence, statistics is a field of study concerned with
  • Data collection, organization, summarization, and analysis
  • Drawing inferences about a body of data from a representative sample of data.
  • Simply put,
    • data are recorded numbers,
    • these numbers contain information,
    • statistics investigate and evaluate recorded data to reveal this information and its meaning in the context of the study population.
  • Data Volume Creation and Consumption in the Future (IDC & Statista, 2020)
    • The year 2022 – 94 zettabytes
    • The year 2023 – 118 zettabytes
    • The year 2024 – 149 zettabytes
      • (1 zettabyte = 1000,000,000,000 GB)
  • As of 2020, the average data consumption per user per month across 3G and 4G networks in India were 13462 megabytes. (Statista)
  • Trends for big data generation are
  • In comparison, conventional sources of data for medical use have been restricted to
    • Routinely organisational records.
    • Patient medical records – These are becoming increasingly important with the advent of digital electronic health records (EHR).
    • Accounting records – More often than not, these are the best-kept records to be found.
    • Survey records – for an objective assessment of subjective opinions and/or practices.
    • Experimental observations – These are specifically important when the required data is peculiar to a given situation.
    • External sources – These include:-
      • Published reports,
      • Commercially available data banks,
      • Research literature


  • Simply put, it is the application of statistical tools and concepts in the field of biological sciences.
  • Here the data are derived from biological sciences like medicine.


  • a characteristic under observation that adopts different values under different circumstances
    • Quantitative Variables
      • Measurements made on quantitative variables convey information regarding the amount.
    • Qualitative Variables
      • Measurements made on qualitative variables convey information regarding
        attribute or frequencies of counts.

Random Variable

  • Here the values obtained for a variable are a result of chance because of which they cannot be exactly predicted in advance.
  • Types
    • Discrete Random Variable
      • It is characterized by gaps or interruptions in the values which the variable can assume.
    • Continuous Random Variable
      • It does not possess any gaps or interruptions characteristic of a discrete random variable.
  • However, consequent to the limitations of measuring instruments, observations about variables that are inherently continuous are recorded as discrete.


  • defined as the largest collection of entities of interest at a particular time.
  • Alternatively, defined as the largest collection of values of a random variable of interest at a given time.
  • Populations are determined by a subjective sphere of interest
  • Populations may either be finite or infinite.


  • defined simply as a part of a population.
  • It may be
    • Representative sample
    • Non-Representative sample

Research study

  • It is a scientific study of a phenomenon of interest.
  • Research studies entail
    • designing sampling protocols,
    • collecting and analyzing data,
    • and providing valid conclusions based on the results of the analyses.


  • These are a special type of research study
  • The observations are made after specific manipulations of conditions have been carried out
  • They provide the foundation for scientific research.



  • Defined as a system of
    assignment of numbers
    to objects or events
    using a predetermined set of rules.

Types of measurement scales

The Nominal Scale

  • This is considered the simplest and least scalable form of a measurement scale
  • It entails “naming” observations
  • It works by organising measurements into mutually exclusive and collectively exhaustive categories.

The Ordinal Scale

  • Here in addition to recording observations in different categories, they are ranked according to predetermined criteria.

The Interval Scale

  • It is considered to be a more sophisticated scale than the nominal or ordinal scale.

  • Here the distance between any two measurements is known

  • this requires the use of an arbitrarily determined

    • unit distance
    • zero point
  • This means that the selected zero point is not necessarily a true zero (total absence of the quantity being measured).

The Ratio Scale

  • It is widely considered to be the highest level of any measurement scale.
  • Fundamental prerequisite for a ratio scale is a true zero point.
  • This scale is characterized by the fact that it can determine
    1. equality of ratios
    2. equality of intervals


Statistical inference
: It is the procedure by which we reach a conclusion about a population on the basis of the information drawn from a sample that has been drawn from that population.

  • To make a valid inference about a population, scientific samples need to be drawn from the population.
  • There are also many kinds of scientific samples – the simplest of these is the simple random sample.

    Simple random sample :
    If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected, the sample is called a simple random sample.

  • To ensure true randomness of the selection, some objective procedure needs to be followed.
  • Types of procedures for simple random sampling:

    • Sample with replacement: Here every member of the population is available at
      each draw.
    • Sample without replacement : As the sampled members are removed from the population for subsequent sampling, observations could be recorded from them only once.
  • In practice, sampling is always done without replacement.

Leave a Reply

Your email address will not be published. Required fields are marked *