QIIME creates plots of alpha diversity vs. simulated sequencing effort, known as rarefaction plots, using the script make_rarefaction_plots.py. This script takes a mapping file and any number of rarefaction files generated by collate_alpha.py and creates rarefaction curves. Each curve represents a sample and can be colored by the sample metadata supplied in the mapping file.
How To Plot A Rarefaction Curve In Excel
Download File: https://tlniurl.com/2vHBiY
This step generates a wf_arare/alpha_rarefaction_plots/rarefaction_plots.html that can be opened with a web browser, in addition to other files. The wf_arare/alpha_rarefaction_plots/average_tables/ folder, which contains the rarefaction averages for each diversity metric, so the user can optionally plot the rarefaction curves in another application, like MS Excel. The wf_arare/alpha_rarefaction_plots/average_plots/ folder contains the average plots for each metric and category and the wf_arare/alpha_rarefaction_plots/html_plots/ folder contains all the images used in the html page generated.
To view the rarefaction plots, open the file wf_arare/alpha_rarefaction_plots/rarefaction_plots.html in a web browser, typically by double-clicking on it. Once the browser window is open, select the metric PD_whole_tree and the category Treatment, to reveal a plot like the figure below. You can also turn on/off lines in the plot by (un)checking the box next to each label in the legend, or click on the triangle next to each label in the legend to see all the samples that contribute to that category. Below each plot is a table displaying average values for each measure of alpha diversity for each group of samples the specified category.
Species accumulation curves show the rate at which new species are found within a community and can be extrapolated to provide an estimate of species richness. The simplest type of species accumulation curve is the Collector Curve. This plots the cumulative number of species recorded as a function of sampling effort (i.e. number of individuals collected or cumulative number of samples). The order in which samples are included in a species accumulation curve will influence the overall shape. A smooth accumulation curve can be produced by repeating a process of randomly adding the samples to the accumulation curve and then plotting the mean of these permutations. This menu can produce these type of species accumulation curves and can plot a Coleman curve of the expected number of species.
Specifies the method for the plot. For the Collector and Rarefaction curves you can select between Individual- and Sample-based methods.
CURSAT ver. 2.1 is an open-source code in QB64 basic, compilable into an executable file, that produces n pseudoreplicates of an empirical data set. Both resamplings with and without replacement are allowed by the software. The number (n) of pseudoreplicates is set by the user. Pseudoreplicates can be exported in a file that can be opened by a spreadsheet. Thus, pseudoreplicates are permanently stored and available for the calculation of statistics of interest and associated variance. The software also uses the n pseudoreplicate data to reconstruct n accumulation matrices, appended in an output file. Accumulation has applicability in cases in which repeated sample-based data must be evaluated for exhaustiveness. Many situations involve repeated sampling from the same set of observations. For example, if data consist of species occurrence, the software can be used by a wide spectrum of specialists such as ecologists, zoologists, botanists, biogeographers, conservationists for biodiversity estimation. The software allows performing accumulation irrespectively whether the input data set contains abundance (quantitative) or incidence (binary) data. Accumulation matrices can be imported in statistical packages to estimate distributions of successive pooling of samples and depict accumulation and rarefaction curves with associated variance.
Independently whether it is based on abundance or incidence, the estimation of species richness depends on sample size, due to both sampling effects and intrinsic factors such as seasonality, species turnover, etc. As a consequence of this, repeated sampling is expected to increase the cumulative number of species observed in a certain area. Thus, the successive pooling of samples from a single location produces a species accumulation, and a pattern of the species accumulation and rarefaction curves may be described in several model-based ways [6, 7, 8, 9].
An alternative to model-based ways to estimate accumulation and rarefaction curves is provided by resampling. Resampling with or without replacement has become one of the most widely used measures of statistical support in several disciplines, especially when the analytical estimation of the variance of statistics of interest is not possible. The second option is usually preferred over the first, because resampling with replacement (bootstrap) may suffer some underestimation respect to resampling without replacement. However, the variance among randomizations estimated via resampling without replacement approaches zero, when reaching the last accumulation level, whereas variance estimated via bootstrap does not suffer from such an effect [10].
CURSAT ver. 2.1 is a simple open-source software that allows the generation of exportable pseudoreplicate data sets and, in combination with commonly used statistical packages, the construction of accumulation and rarefaction curves and associated sample variance by using a rarefaction-by-resampling approach. Such an approach is not novel and has already been applied in ecology and genetics (e.g., [12, 13]). In fact, several R packages and software implement resampling rarefaction approaches such as the specaccum function in vegan R package [14], mothur [15], the EstimateS program [16], iNEXT R package [17], and RTK [18].
Similar information is reported in the first column of each of the accumulation matrices generated. In these matrices sampling events and associated cumulative number of objects (species in the example) are reported in the second and third columns, respectively. Such a file can be imported in statistical packages to estimate distributions of successive pooling of samples and depict accumulation and rarefaction curves. The graphs in Figure 4 were obtained using Statistica ver. 8 (StatSoft, Inc).
The software can be used by ecologists, biogeographers, zoologists, botanists, conservationists but it also has general applicability in all cases in which data matrices from repeated sampling have to be evaluated by a resampling/rarefaction approach. For example, CURSAT ver. 2.1 has been recently used to account for a possible bias in the number of haplotypes caused by unequal sample size, in a genetic study of hemoparasites of Galápagos iguanas (Fulvo et al., submitted).
Create a file in code/ that is called plot_rarefaction_curves.R that contains the code needed to generate the 490 rarefaction curves colored by diagnosis. Draw a vertical gray line behind the curves to indicate where the 10,530 sequence threshold was. Restart R and run source("code/plot_rarefaction_curves.R") to make sure it runs as intended.
The rarefaction curve of annotated species richness is a plot (seeFigure 5.11 of the total number of distinctspecies annotations as a function of the number of sequences sampled.The slope of the right-hand part of the curve is related to the fractionof sampled species that are rare. On the left, a steep slope indicatesthat a large fraction of the species diversity remains to be discovered.If the curve becomes flatter to the right, a reasonable number ofindividuals is sampled: more intensive sampling is likely to yield onlyfew additional species. Sampling curves generally rise quickly at firstand then level off toward an asymptote as fewer new species are foundper unit of individuals collected.
The rarefaction curve is derived from the protein taxonomic annotationsand is subject to problems stemming from technical artifacts. Theseartifacts can be similar to the ones affecting amplicon sequencing(Reeder and Knight 2009), but the process of inferring species fromprotein similarities may introduce additional uncertainty.
The rarefaction view is available only for taxonomic data. Therarefaction curve of annotated species richness is a plot (see Figure5.18) of the total numberof distinct species annotations as a function of the number of sequencessampled. As shown in Figure5.18, multiple data setscan be included.
The slope of the right-hand part of the curve is related to the fractionof sampled species that are rare. When the rarefaction curve is flat,more intensive sampling is likely to yield only a few additionalspecies. The rarefaction curve is derived from the protein taxonomicannotations and is subject to problems stemming from technicalartifacts. These artifacts can be similar to the ones affecting ampliconsequencing (Reeder and Knight 2009), but the process of inferringspecies from protein similarities may introduce additional uncertainty.
Sampling curves generally rise very quickly at first and then level offtoward an asymptote as fewer new species are found per unit ofindividuals collected. These rarefaction curves are calculated from thetable of species abundance. The curves represent the average number ofdifferent species annotations for subsamples of the the completedataset.
In this talk, two types of standardization methods are reviewed: (1) Sample-size-based rarefaction and extrapolation methods aim to compare diversity estimates for equally-large samples determined by samplers. (2) Coverage-based rarefaction and extrapolation methods aim to compare diversity estimates for equally-complete samples; the sample completeness in this method is measured by sample coverage (the proportion of the total number of individuals that belong to the species detected in the sample), a concept originally developed by Alan Turing and I. J. Good in their cryptographic analysis during World War II. Contrary to intuition, sample coverage for the observed sample, rarefied samples, and extrapolated samples can be accurately estimated by the observed data themselves. These two types of standardization methods allow researchers to efficiently use all available data to make robust and detailed inferences about the sampled assemblages, and also to make objective comparisons among multiple assemblages. Hypothetical and real examples are presented for illustrating the use of the online software iNEXT (iNterpolation/EXTrapolation) to compute and plot seamless rarefaction/extrapolation sampling curves based on several diversity measures. 2ff7e9595c
Comments