The Undergrad Edition - Epidemiology

Friday, November 22, 2013

Chapter 5 - Public Health Surveillance

Surveillance is to observe with the purpose of giving in return direction. Surveillance is providing and interpreting observation to prevent and control disease. Everything in Public Health sprouts from surveillance data and the data inference. Public Health Surveillance needs to end with suggested direction supported by data and data inference.

Your surveillance project needs an objective, a purpose, a use for the data. Discover objectives by asking...
-What is the health event?
-What is the case definition for surveillance (this could be different than clinical and outbreak definitions)?
-What are the planned uses of this data?
-Where will this surveillance data overlap previous data?
-Who is the population?
-When should the data be acquired and how frequent?
-How will the data be gathered?
-How will the data be stored?
-How will analysis be done and by whom?
-How will the data be distributed once finished and to whom?

Major aspects of a Public Health Surveillance...
Acceptability. Will the target population participate in your data collection.
Flexibility. Changes happen, can the surveillance adapt without extra cost?
Predictive Value Positive. How many reported cases really are cases?
Quality. Data completeness/how many blanks do you have?
Representativeness. How well dose your sample depict the population?
Sensitivity. Can the surveillance account for those that are sick and hard to reach?
Simplicity. The easier the surveillance is to run the better.
Stability. How reliable and available resources are to conduct and finish surveillance.
Timelessness. Can the surveillance be completed fast enough for action to take place that will make a difference? (try a syndromic surveillance for faster disease findings)
Validity. Is the surveillance discovering the outbreaks and non-outbreaks it should? Like sensitivity and positive value.

TIP: Collecting data takes large time commitments and consumes large amounts of resources. Also the quality and representation of data will become the bottle neck of any surveillance so make sure the problem and objectives truly are significant. Things to consider when selecting surveillance projects include...
-incidence
-prevalence
-sequela
-severity (mortality)
-DALY's
-socioeconomic impact (burden of disease)
-communicability
-outbreak potential
-public concern
-international requirements
-prevent-ability
-control measures (including treatment)
-health system capacity to act speedily
-availability of resources to act
-economics
-ease of surveillance

NOTE: Communicable disease are the health events most under surveillance.

CDC and CSTE are U.S centers that determine how surveillance is done in the U.S government.

Saturday, November 2, 2013

Chapter 4 - illustrate data summaries

CHAPTER SUMMARY

After data surveillance is complete it is time to publish your findings. How do you do this? No one is going to read a long spread sheet! You need to use small tables, line graphs, epidemic curve, population pyramid graphs, frequency polygons, etc. This chapter introduced these different ways of showing your powerful data so readers can...well.. feel your data's power.

CHAPTER CONTENT

Review Data Type. Nominal data is qualitative data where the order of categories does not make a difference. For example male and female data. Ordinal data is also qualitative and here order does matter to an extent. For example, low medium and high, or stage 1, 2, 3, and 4. Discrete data is quantitative data in the form of integers. For example, number of ill cases. Continuous data is quantitative data that can have any numerical value. For example, time of symptom onset.

Table Creation: how to make your table stand alone. First, include person, place, and time followed by the table number in the title of your table. You can't read a book by it's cover but you should be able to understand a table and graph by theirs. Second, label units in column and row headings. Third, sometimes you should make a column or row for totals, like when your table includes percents. Fourth, provide disclaimers for any missing data and excluded trials rather in your table or in a footnote. Fifth, define all codes and abbreviations with in a footnote. Sixth, beneath your table provide the source of your data and mention rounding error if totals add up to less than or more than 100%.

The Facts About Tables. First, the smaller your table the more time readers spend looking at it. Second, columns are compared to other columns for example data on males and data on females should be written in columns if you want them compared. Third, by using percents in a table you are showing the relative burden of illness or other outcome.

Types of Tables. One Variable Tables describe occurrence based on one variable such as age, gender, etc. Two Variable Tables describe occurrence based on two variables. Three Variable Tables describe occurrence based on three variables. Both Two and Three Variable Tables are also contingency tables which means the table is being used to expose the association between variables. A type of Two Variable contingency table is a Two-by-Two which has two variables are compared that each have two categories. An example can better explain Two-by-Two tables. Imagine Number of people and Exposure. Number of people is a variable that has the category with the disease and without the disease. Exposure also has two categories, with exposure or without exposure. Composite Tables create a variable called 'characteristics', under this variable many other variables can be written like, age (with intervals), gender, demographics, risk factors, etc. Even though many variables are present, no association between variables are being made. Because of this Composite Tables operate like a bunch of One Variable Tables crammed into one table. Usually data tables are pre-made in what is called a Table Shell. These include titles, headings, and categories, really all the information is there except the numbers. Also many Table Shells include more variables than will seen in the final table. This is done in order to simply the table after the correct variable associations are discovered.

Strategies For Grouping Data. Try making groups of equal sizes. Or make groups based on mean, standard deviation and range. Another way is to make groups based on the range divided by the number of desired groups. Remember, class intervals should be mutually exclusive and exhaustive. This means that a person should only fit into one interval and that intervals should include every case.

Graphs: display numerical data in visual form. For graph creation follow the same rules as went over under the heading "Table Creation: how to make your table stand alone". In addition to those rules limit the number of lines you use. Every line you draw should maximize it's meaning for the space it takes up. Remember that frequency is usually put on the vertical axis and categories on the horizontal axis. When making a graph, make the horizontal axis longer than the vertical axis, this creates a landscape picture and looks nicer.

Types of Graphs. Arithmetic-Scale-Line Graphs are a simple graph with an x and y-axis, and points connected by lines. This graph is used to show long series of data, to compare several series of data, and to show rates over time. Semilogarithmic Line Graphs are very cool! They are used when the data has changed from a very high number to a very low number. This graph using a logarithm scale/order of magnitudes for the y-axis. Imagine the y-axis labeled as 0, 1, 10, 100, 1000. This is used to show the change in rate. Histograms are very common but are still powerful. For example an Epidemic Curve Histogram looks like this:

When drawing an epidemic curve use time intervals between 1/3 and 1/8 the incubation period. If you are comparing genders use a Population Pyramid like this:

Frequency Polygon Graphs uses a histogram graph and creates a point in the middle of each bar. Each point is then connected creating a line graph. Imagine turning the epidemic curve histogram into a frequency polygon. It would look pretty cool. A Cumulative Frequency Graph plot on the y-axis, cumulative data like the data in the form of percentages and make it easy for readers to see the mean, medium, etc. Survival Curves are cumulative frequency graphs but start with 100% instead of 0%.

Not as Common Graphs: Scatter Diagrams, Bar Charts where the length of text labels determine if the bars will be vertical or horizontal (longer texts) and discrete data is shown where histograms show continuous. Grouped Bar Charts show multiple variables, often subgroups of variables. Stacked Bar Charts better show the comparative value of the first variable. 100% Bar Chart is a stacked Bar chart where one axis is in percent and all bars go to 100%, this better helps you compare subgroups. Deviation Bar Chart shows both negative and positive values as the center of the graph there is a line at '0'. Pie Charts are great for comparing groups to each other and to the overall sum of the groups. Pie Charts should label values of segments, start at 12 o'clock, and should be ordered from largest to smallest slices, other categories can be put last. Often multiple pie charts will be used instead of a 100% bar chart. Dot Plots and Box and Whisker Plots are used to show a continuous variable over a categorical variable. Whisker plots show quartiles where the medium, 2nd, and 3rd quartile are used to create a box and whiskers display the extent of range. These plots are used to compare skewness (seen when the medium is not centered inside the box) between variables and to compare the inner 50% data between variables. Forest Plots are cool. You use them to compare results from different studies showing the same results. The plot shows a line for the confidence interval and a point for the point-estimate. If points line up then studies agree with each other. Also a vertical line is drawn through point 0 showing that any point that is not on this line is showing significance (that the variable has been found to make a difference). Phylogenetic Tree shows the genetic linage of organisms involved in an outbreak. Decision Trees map out choices, outcomes and probabilities. Under each outcome is the outcomes probability which ranges from 0 to 1.

Maps. Spot Maps show dots for every reported case. This is good because it does not show population. See Atlas of United States Mortality, Aids, An Historical Geography of Human Viral Disease. Chloropleth Maps which are area maps use different shades to show different values.

A geographic information system is a computer system for the input, editing, storage, retrieval, analysis, synthesis, and output of location-based information.(22) In public health, GIS may use geographic distribution of cases or risk factors, health service availability or utilization, presence of insect vectors, environmental factors, and other location-based variables. GIS can be particularly effective when layers of information or different types of information about place are combined to identify or clarify geographic relationships.

Friday, October 11, 2013

Second half of Chapter 2

Review: mode, medium and mean are good descriptive measures but only mean is used for statistical analysis.

Calculate standard deviation take the mean. Minus all data points by the mean. Square and sum your answers. Divide this answer by (n-1). Now you have the variance. To get the standard deviation take the square root of the variance.

Calculate standard error divide the standard deviation by the square root of n.

Calculate a 95 % confidence interval . 95% = 1.96 standard deviations. Multiply the standard error by this number. The 95% confidence interval range is the mean plus and minus this number.

If the mean is greater than the medium than the data is skewed left. And vise versa.

Friday, October 4, 2013

Chapter 2

Line Listing is equivalent to a spread sheet. Row variables are called observations or records aka the patient or case. Instead of making line listings in excel CDC has a published and free software, EPI Info, which is used. To make a line listing in EPI Info click create survey and enter in all the data.

There are 4 variable types, seen in the columns of line listings. First, the nominal-scale-variable. This variable is all about placing a observation or record in or out of a group, gender, vaccination status, etc are examples. Second, the ordinal-scale-variable. This variable is grouping using the ranking system, where each rank is not an equal step above the other, an example of this is stage of cancer. Third, the interval-scale-variable. This is like the ordinal variable but has intervals that are evenly spaced apart, an example of this would be date of birth. The fourth, ratio-scaled-variables. This variable is interval like, but it has a definite zero, so the magnitude is more more important, an example of this is time of illness.

If a line listing is a spread sheet of all observations/records matched with their their individual data information. Than a frequency distributions are tables of a variable outcomes matched with the number of corresponding records/observations. Doing so reduces the size of the data and concentrates the information which makes understanding of the bigger pictures easier. To make a frequency distribution in EPI Info click frequency than variable.

The central location/central tendency of a frequency distributions is the area where most of the data points exist at. The spread of a frequency distribution is described using range and standard deviations. Skewed is a measurement of a frequency distribution curve's shape. It is the measurement of the tail not the hump so left skewed means the tail is on the left side. The normal distribution or the normal bell curve is also called the Gaussian distribution and is when the mean, median and mode are equal.

Don't forget about mode although it appears as insignificant when compared to mean, it is not. Think about a vaccination clinic looking at data showing the days people attended, they would want to know which day people come most often. Or what is the most common amount of DPT vaccine children have in your community. Average is not important here, it is mode. In EPI Info to find the mode, just create a histogram and see which bar is highest.

In EPI Info, to find medium (which is also the 50th percentile) and mean, click the means tab, in the means of click the variable of interest, than ok. The medium is used when dealing with skewed data.

The centering property of the mean is interesting. Take the mean and subtract it from all values. Add all these numbers up and you get zero.

The midrange is an interesting tool as well. You take the lowest observation add it to the highest observation and divide by 2. It is interesting because when calculating midrange for age, you ad the lowest observation and highest observation and 1 before you divide by 2 to compensate for the days in between the years.

The geometric mean is different. To calculate you have to multiply all the values together then take the root of n where n equals the number of observations. Or you have to take the log of everything, get the mean, than compute the antilog of that mean. This is used only for data that shows a logarithmic pattern such as 1, .1, .01, .001, .0001, etc. This number is always less than an arthritic mean and is used for data points that differ greatly from each other or have exponents.

Sunday, September 22, 2013

Continuing chapter one

Surveillance = "monitoring the pulse of our community"

Field Investigation = collecting data yourself

Analytic Study = looking at the data to find answers

Evaluation = thinking about how well it went

Linkage = networking

Policy Creation = giving your recommendation on what should be done

Time, Place, and Person

Our textbook begins: CDC's Principles of Epidemiology in Public Health Practice Chapter 1

Our textbook is free. This is because it is a government publication, and government publications don't have copyrights or something like that. Anyways it's the CDC's textbook written for government health workers as their official book on understanding epidemiology. Glad to be reading something this official. Also the CDC wrote it (in collaboration with other governmental departments) for me this is a huge plus.

So here we go, chapter one.

Epidemiology answers the questions that we were taught to ask in elementary school - the who, what, where, how and why - for questions of the health of a population. For example, what diagnosis, what health event? Which persons? Determine the place? What time and timing? What caused this, what are contributing risk factors, what is the mode of transmission? This is detective work.

To find these answers through the epidemiologist method requires a background in probability, statistics, research methods, and applying it too hypothesis in the fields of biology, physics, behavior sciences, and ergonomics (google define ergonomics: the study of people's efficiency in their working environment).

These answers give public health science to work from, data to act on. They are like the dispatch center calling in crimes, and giving information from which action can follow.

Think like an epidemiologist. "Epidemiologists assume
that illness does not occur randomly in a population, but happens
only when the right accumulation of risk factors or determinants
exists in an individual." - texbook

• What are the actual and potential health problems in the
community?
• Where are they occurring?
• Which populations are at increased risk?
• Which problems have declined over time?
• Which ones are increasing or have the potential to increase?
• How do these patterns relate to the level and distribution of
public health services available?

John Snow is the father of epidemiology as in 1854 he figured out the source of the cholera outbreak. The source was a water pump. The handle was removed and the cholera epidemic was halted. He used a spot map to prove his findings.

Doll and Hill are two who used epidemiology to link lung cancer to smoking.

Smallpox eradication was brought from epidemiology applied.