Title: | Statistical Analysis of Mixed Ploidy Populations |
---|---|
Description: | Allows users to calculate pairwise Nei's Genetic Distances (Nei 1972), pairwise Fixation Indexes (Fst) (Weir & Cockerham 1984) and also Genomic Relationship matrixes following Yang et al. (2010) in mixed and single ploidy populations. Bootstrapping across loci is implemented during Fst calculation to generate confidence intervals and p-values around pairwise Fst values. StAMPP utilises SNP genotype data of any ploidy level (with the ability to handle missing data) and is coded to utilise multithreading where available to allow efficient analysis of large datasets. StAMPP is able to handle genotype data from genlight objects allowing integration with other packages such adegenet. Please refer to LW Pembleton, NOI Cogan & JW Forster, 2013, Molecular Ecology Resources, 13(5), 946-952. <doi:10.1111/1755-0998.12129> for the appropriate citation and user manual. Thank you in advance. |
Authors: | LW Pembleton |
Maintainer: | LW Pembleton <[email protected]> |
License: | GPL-3 |
Version: | 1.6.3.9000 |
Built: | 2025-01-25 03:04:00 UTC |
Source: | https://github.com/lpembleton/stampp |
A data frame containing Solcap potato genotype data in tetraploid and diploid format as an small example of the input format required by StAMPP
data(potato)
data(potato)
A data frame with 30 rows and 48 variables:
Sample names
Population name
Ploidy level
Format of genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
The example genotype data is a subset of data from the publically avaliable Solcap potato dataset which was re-scored in GenomeStudio in diploid and tetraploid formats
A data frame containing Solcap potato genotype data in tetraploid and diploid format as an small example of the input format required by StAMPP
data(potato.mini)
data(potato.mini)
A data frame with 6 rows and 48 variables:
Sample names
Population name
Ploidy level
Format of genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
genotype data
The example genotype data is a subset of data from the publically avaliable Solcap potato dataset which was re-scored in GenomeStudio in diploid and tetraploid formats
Converts a StAMPP formated allele frequency data frame generated from the stamppConvert function to a genlight object for use in other packages
stampp2genlight(geno, pop = TRUE)
stampp2genlight(geno, pop = TRUE)
geno |
a data frame containing allele frequency data generated from stamppConvert |
pop |
logical. True if population IDs are present in the StAMPP genotype data, False if population IDs are absent. |
StAMPP only exports to genlight objects as they are able to handle mixed ploidy datasets unlike genpop and genloci objects. The genlight object allows the intergration between StAMPP and other common R packages such as ADEGENET
A object of class genlight which contains genotype data, individual IDs, population IDs (if present) and ploidy levels
Luke Pembleton <lpembleton at barenbrug.com>
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Convert the StAMPP formatted allele frequency data frame to a genlight object potato.genlight <- stampp2genlight(potato.freq, TRUE)
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Convert the StAMPP formatted allele frequency data frame to a genlight object potato.genlight <- stampp2genlight(potato.freq, TRUE)
Calculates an AMOVA based on the genetic distance matrix from stamppNeisD() using the amova() function from the package PEGAS for exploring within and between population variation
stamppAmova(dist.mat, geno, perm = 100)
stamppAmova(dist.mat, geno, perm = 100)
dist.mat |
the matrix of genetic distances between individuals generated from stamppNeisD() |
geno |
a data frame containing allele frequency data generated from stamppConvert, or a genlight object containing genotype data, individual IDs, population IDs and ploidy levels |
perm |
the number of permutations for the tests of hypotheses |
Uses the formula distance ~ populations, to calculate an AMOVA for population differentiation and within & between population variation. This function uses the amova function from the PEGAS package.
An object of class "amova" which is a list containing a table of sum of square deviations (SSD), mean square deviations (MSD) and the number of degrees of freedom as well as the variance components
Luke Pembleton <lpembleton at barenbrug.com>
Paradis E (2010) pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26, 419-420. <doi:10.1093/bioinformatics/btp696>
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between individuals potato.D.ind <- stamppNeisD(potato.freq, FALSE, "standard") # Calculate AMOVA stamppAmova(potato.D.ind, potato.freq, 100)
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between individuals potato.D.ind <- stamppNeisD(potato.freq, FALSE, "standard") # Calculate AMOVA stamppAmova(potato.D.ind, potato.freq, 100)
Imports biallelic AB formated or allele A frequency genotype data. If the data is in imported in biallelic AB format this function also converts it to allele frequencies
stamppConvert(genotype.file, type = "csv")
stamppConvert(genotype.file, type = "csv")
genotype.file |
the genotype input file. This should be a R matrix object or a file path for a csv file containing the genotype data in either bialleleic AB format or allele 'A' frequency format, or a genlight object containing genotype data |
type |
the type of file the genotype data is being imported from; "csv" = comma seperated file, "r" = data frame in the R workspace, "genlight" = genlight object. |
An object of class data.frame which contains allele frequency data for use in other StAMPP functions
Luke Pembleton <lpembleton at barenbrug.com>
# Import example data into the R workspace data(potato.mini, package="StAMPP") # Convert to allele frequencies potato.freq <- stamppConvert(potato.mini, "r")
# Import example data into the R workspace data(potato.mini, package="StAMPP") # Convert to allele frequencies potato.freq <- stamppConvert(potato.mini, "r")
This function calculates pairwise Fst values along with confidence intervals and p-values between populations according to the method proposed by Wright(1949) and updated by Weir and Cockerham (1984)
stamppFst(geno, nboots = 100, percent = 95, nclusters = 1)
stamppFst(geno, nboots = 100, percent = 95, nclusters = 1)
geno |
a data frame containing allele frequency data generated from stamppConvert, or a genlight object containing genotype data, individual IDs, population IDs and ploidy levels |
nboots |
number of bootstraps to perform across loci to generate confidence intervals and p-values |
percent |
the percentile to calculate the confidence interval around |
nclusters |
number of proccesor treads or cores to use during calculations. |
If possible, using multiple processing threads or cores is recommended to assist in calculating Fst values over a large number of bootstraps.
An object list with the components:
Fsts
a matrix of pairwise Fst values between populations
Pvalues
a matrix of p-values for each of the pairwise Fst values containined in the 'Fsts' matrix
Bootstraps
a dataframe of each Fst value generated during Bootstrapping and the associated confidence intervals
If nboots<2, no bootstrapping is performed and therefore only a matrix of Fst values is returned.
Luke Pembleton <lpembleton at barenbrug.com>
Wright S (1949) The Genetical Structure of Populations. Annals of Human Genetics 15, 323-354. <doi:10.1111/j.1469-1809.1949.tb02451.x> Weir BS, Cockerham CC (1984) Estimating F Statistics for the ANalysis of Population Structure. Evolution 38, 1358-1370. <doi:10.2307/2408641>
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate pairwise Fst values between each population potato.fst <- stamppFst(potato.freq, 100, 95, 1)
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate pairwise Fst values between each population potato.fst <- stamppFst(potato.freq, 100, 95, 1)
This function calculates a genomic relationship matrix following the method decribed by Yang et al (2010)
stamppGmatrix(geno)
stamppGmatrix(geno)
geno |
a data frame containing allele frequency data generated from stamppConvert, or a genlight object containing genotype data, individual IDs, population IDs and ploidy levels |
An object of class matrix which contains the genomic relationship values between each individual
Luke Pembleton <lpembleton at barenbrug.com>
Yang J, Benyamin B, McEvoy BP, et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42, 565-569. <doi:10.1038/ng.608>
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genomic relationship values between each individual potato.fst <- stamppGmatrix(potato.freq)
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genomic relationship values between each individual potato.fst <- stamppGmatrix(potato.freq)
This function calculates Nei's genetic distance (Nei 1972) between populations or individuals
stamppNeisD(geno, pop = TRUE, measure = "standard")
stamppNeisD(geno, pop = TRUE, measure = "standard")
geno |
a data frame containing allele frequency data generated from stamppConvert, or a genlight object containing genotype data, individual IDs, population IDs and ploidy levels |
pop |
logical. True if genetic distance should be calculated between populations, false if it should be calculated between individual |
measure |
a character string defining the distance measure to use: "standard" for the Neis standard genetic distance 1972 or "DA" for Neis DA distance 1983. |
A object of class matrix which contains the genetic distance between each population or individual
Luke Pembleton <lpembleton at barenbrug.com>
Nei M (1972) Genetic Distance between Populations. The American Naturalist 106, 283-292.
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between individuals potato.D.ind <- stamppNeisD(potato.freq, FALSE, "standard") # Calculate genetic distance between populations potato.D.pop <- stamppNeisD(potato.freq, TRUE, "standard")
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between individuals potato.D.ind <- stamppNeisD(potato.freq, FALSE, "standard") # Calculate genetic distance between populations potato.D.pop <- stamppNeisD(potato.freq, TRUE, "standard")
Converts the genetic distance matrix generated with stamppNeisD into Phylip format and exports it as a text file
stamppPhylip(distance.mat, file = "")
stamppPhylip(distance.mat, file = "")
distance.mat |
the matrix containing the genetic distances generated from stamppNeisD to be converted into Phylip format |
file |
the file path and name to save the Phylip format matrix as |
The exported Phylip formated text file can be easily imported into sofware packages such as DARWin (Perrier & Jacquemound-Collet 2006) to be used to generate neighbour joining trees
Luke Pembleton <lpembleton at barenbrug.com>
Perrier X, Jacquemound-Collet JP (2006) DARWin - Dissimilarity Analysis and Representation for Windows. Agricultural Research for Development
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between populations potato.D.pop <- stamppNeisD(potato.freq, TRUE, "standard") # Export the genetic distance matrix in Phylip format ## Not run: stamppPhylip(potato.D.pop, file="potato_distance.txt")
# import genotype data and convert to allele frequecies data(potato.mini, package="StAMPP") potato.freq <- stamppConvert(potato.mini, "r") # Calculate genetic distance between populations potato.D.pop <- stamppNeisD(potato.freq, TRUE, "standard") # Export the genetic distance matrix in Phylip format ## Not run: stamppPhylip(potato.D.pop, file="potato_distance.txt")