Microarray Data Analysis in GeneSpring GX 11

Podobni dokumenti
Športno društvo Jesenice, Ledarska 4, 4270 Jesenice, Tel.: (04) , Fax: (04) , Drsalni klub Jesenice in Zv

Društvo za elektronske športe - spid.si Vaneča 69a 9201 Puconci Pravila tekmovanja na EPICENTER LAN 12 Hearthstone Na dogodku izvaja: Blaž Oršoš Datum

Microsoft Word - M docx

Microsoft Word - ARRS-MS-BR-07-A-2009.doc

ARRS-BI-FR-PROTEUS-JR-Prijava/2011 Stran 1 od 7 Oznaka prijave: Javni razpis za sofinanciranje znanstvenoraziskovalnega sodelovanja med Republiko Slov

PRESENT SIMPLE TENSE The sun gives us light. The sun does not give us light. Does It give us light? Raba: Za splošno znane resnice. I watch TV sometim

Microsoft Word - ARRS-MS-FI-06-A-2010.doc

Nameščanje Adopt Open Java Development Kit 8

Osnovna šola dr. Jožeta Pučnika Osnovna Črešnjevec 47, 2130 Slovenska Bistrica Tel:(02) ; Fax: (02) www.

Nameščanje Adopt Open Java Development Kit 8

Daniel Adanza Dopazo MACHINE LEARNING ON BIG DATA USING MONGODB, R AND HADOOP Master thesis Maribor, December 2016

Microsoft Word - ARRS-MS-CEA-03-A-2009.doc

Microsoft Word - P101-A doc

2_Novosti na področju zakonodaje

ZAHTEVA ZA VZDRŽEVANJE LEI (sklad) REQUEST FOR A MAINTENANCE OF LEI (fund) 1. PODATKI O SKLADU / FUND DATA: LEI: Ime / Legal Name: Druga imena sklada

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Uvod v bioinformatiko Introduction to bioinformatics Študijski program in stopnja Study p

PAST CONTINUOUS Past continuous uporabljamo, ko želimo opisati dogodke, ki so se dogajali v preteklosti. Dogodki so se zaključili v preteklosti in nič

untitled

Microsoft Word - GB-PSP-2006.doc

UČNI NAČRT PREDMETA / COURSE SYLLABUS Predmet: Podatkovne baze 1 Course title: Data bases 1 Študijski program in stopnja Study programme and level Vis

P183A22112

PROFILNA TEHNIKA / OPREMA DELOVNIH MEST PROFILE TECHNIC / WORKSTATION ACCESSORIES INFO ELEMENTI / INFO ELEMENTS INFO TABLA A4 / INFO BOARD A4 U8L U8 U

Slide 1

Preštudirati je potrebno: Floyd, Principles of Electric Circuits Pri posameznih poglavjih so označene naloge, ki bi jih bilo smiselno rešiti. Bolj pom

Microsoft Word - SI_vaja1.doc

16 TABAK.cdr

Analiza vpliva materiala, maziva in aktuatorja na dinamiko pnevmatičnega ventila

AirDrive Keylogger / PRO / MAX Navodila za uporabo 1

PRILOGA 1 Seznam standardov Direktiva Sveta z dne 20. junija 1990 o približevanju zakonodaje držav članic o aktivnih medicinskih pripomočkih za vsadit

Microsoft Word - Faktorska_analiza_blagovne_znamke_II

Vaja04_Ver02

TurningPoint Kratka Slovenska Navodila

Programska oprema Phoenix, različica Opombe ob izdaji SL Revizija 8 April 2018

PowerPoint Presentation

PRILOGA 1: SODELOVANJE NA JAVNEM NAROČILU - ENOSTAVNI POSTOPEK ANNEX 1: PARTICIPATION IN THE TENDER SIMPLIFIED PROCEDURE 1. OPIS PREDMETA JAVNEGA NARO

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS (leto / year 2016/17) Uvod v bioinformatiko Introduction to bioinformatics Študijski prog

Microsoft Word - Delovni list.doc

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Informatizacija malih podjetij Informatisation of Small Companies Študijski program in st

Microsoft Word - si-6 Uporaba informacijsko-komunikacijske tehnologije IKT v gospodinjstvih 1 cetrt 05.doc

an-01-sl-Temperaturni_zapisovalnik_podatkov_Tempmate.-S1.docx

Microsoft Word - Dokument1

Workhealth II

Microsoft Word - A-3-Dezelak-SLO.doc

NEVTRIN d.o.o. Podjetje za razvoj elektronike, Podgorje 42a, 1241 Kamnik, Slovenia Telefon: Faks.: in

ARS1

Uradni list Republike Slovenije Št. 39 / / Stran 6173 EVROPSKA ŠOLA:... Učenec:... Datum rojstva:... Letnik:... Razrednik:... ŠOLSKO POROČI

VISOKA ZDRAVSTVENA ŠOLA V CELJU DIPLOMSKO DELO VLOGA MEDICINSKE SESTRE PRI OBRAVNAVI OTROKA Z EPILEPSIJO HEALTH EDUCATION OF A NURSE WHEN TREATING A C

Obzorje 2020 KI SI, Ljubljana 2. februar 2016 Hitra pot do inovacij Fast Track to Innovation Pilot (FTI Pilot) dr. Igor Milek, SME NKO SPIRIT Slovenij

Oznaka prijave: Javni razpis za sofinanciranje znanstvenoraziskovalnega sodelovanja med Republiko Slovenijo in Združenimi državami Amerike v letih 201

Srednja poklicna in strokovna šola Bežigrad - Ljubljana Ptujska ulica 6, 1000 Ljubljana STATISTIKA REGISTRIRANIH VOZIL V REPUBLIKI SLOVENIJI PROJEKTNA

SLO NAVODILA ZA UPORABO IN MONTAŽO Kat. št.: NAVODILA ZA UPORABO TP LINK dvopasovni gigabitni WLANusmerjevalnik N600 Kataloška

Digiars_Adobe_cenik_maj2012_AVLKOM.xls

Microsoft Word - P072-A doc

REŠEVANJE DIFERENCIALNIH ENAČB Z MEHANSKIMI RAČUNSKIMI STROJI Pino Koc Seminar za učitelje matematike FMF, Ljubljana, 25. september 2015 Vir: [1] 1

UČNI NAČRT PREDMETA / COURSE SYLLABUS Predmet: Podatkovne baze 2 Course title: Data bases 2 Študijski program in stopnja Study programme and level Vis

Presentation Name / Author

Microsoft Word - D&O ZNS Vprasalnik.doc

Microsoft Word - Q-DAS-iNOVA and partners-Free information day-Ljubljana r5.docx

Microsoft Word Cimperman Blaž, Metronik, Upravljanje z energijo in odpadki v kompleksnem industrijskem obratu

(Microsoft Word - U\350enje telegrafije po Kochovi metodi.doc)

Elektro predloga za Powerpoint

Slide 1

ARHIV DRUŽBOSLOVNIH PODATKOV in SEKUNDARNA ANALIZA PODATKOV

Microsoft Word - SI_vaja5.doc

SKUPNE EU PRIJAVE PROJEKTOV RAZISKOVALNE SFERE IN GOSPODARSTVA Maribor, Inovacije v MSP Innovation in SMEs dr. Igor Milek, SME NKO SPIRIT S

ZSL Transfuzijska dejavnost

ROSEE_projekt_Kolesarji

Vaja 3 Kopiranje VM in namestitev aplikacij - strežnik SQL 2000 SP3a A. Lokalni strežnik Vmware ESX Dodajanje uporabnikov vajexx v skupino Vaje

Microsoft Word - CNS-SW3 Quick Guide_SI

(Microsoft PowerPoint - Predstavitev IJS kon\350na.ppt)

ČLANI SKUPINE: Zasedbo Linkin Park sestavlja šest, po njihovem mnenju dolgočasnih, ljudi: Vokalist Chester Bobnar Rob Vokalist Mike Basist Pheonix DJ

Transkripcija:

Microarray Data Analysis in GeneSpring GX 11 Jean Jasinski, Ph.D. Senior Application Scientist jean@strandsi.com Month ##, 200X Agenda New features in GeneSpring GX 11 Guided Workflow Advanced Workflow Data Loading Experiment Setup Quality Control on Samples & Entities Statistical Analysis Updating Annotations 1

Where we ve been GeneSpring GX- Solution for RNA Expression Analysis GeneSpring has 8,000 references in Google Scholar and over 1,600 in peer reviewed publication GeneSpring has long history in RNA-based applications mrna expression analysis microrna analysis with biological contextualization using integrated TargetScan gene target information Alternative splicing analysis using multivariate splicing ANOVA GeneSpring strength lies in biological contextualization Network building and pathway analysis using our species-specific interaction databases Ability to build your own interaction database with provided NLP GO, GSEA, and GSA analysis Automated biological entity translation across species or microarray platform New features in GeneSpring GX 11 GeneSpring GX 11 extends support to DNAapplications Genome-wide association study (GWAS) Test individual SNPs or haplotypes for association to qualitative or quantitative traits Copy number variation analysis Identify statistically significant regions of variation Filter for regions of copy-neutral LOH Identify allele-specific copy number variations 2

Flexible and User-friendly Genome Browser Scatter Plot Histogram Profile Plot Annotation Tracks GeneSpring GX 11 Genome Browser Multiple samples or conditions can be displayed as individual tracks or merged in the same track Data from different experiment types can be Merge tracks displayed in same browser 1 and 2 and merged Plot raw and normalized intensity values, copy number, LOD, and other list associated values Select multiple annotation tracks to be displayed (i.e. mirna, CpG islands, CNVs from DGV etc) 3

Tabbed Visualization Windows Tabbed windows allow easy switching between different visualizations and plots to facilitate interrogation and comparison of data Easier way of selecting multiple Entity Lists for Venn Diagram Drag-and-Drop Entity Lists OR Select Entity Lists from window Entity List Selection window for Venn Diagram automatically opens to display all Entity Lists for all open experiments Multiple Entity Lists can be selected from window at once (Ctrl click) to display in Venn Diagram Entity Lists can also be dragged and dropped into Venn Diagram 4

Find Entity In View Ctrl + F Ctrl + I Support for Affymetrix Text and Pivot Files Affymetrix text and pivot files can be now be imported into standard Affymetrix technologies that support.cel and.chp files No longer need to create Generic Data for Affymetrix text and pivot files, as data file format is automatically recognized 5

Gene-level and Probe-level Expression Analysis Probe-level experiment Gene-level experiment Expression data can be analyzed at gene-level or probe-level Signal intensity values summarized using Entrez ID GeneSpring GX Key features Guided Workflows Pre-determined steps Project-based organization & Translation-on-the-fly Compare platforms, applications, species Biological Contextualization Pathway Analysis, GSEA, GSA, GO, link to Ingenuity IPA Customization Scripting in Jython, R, XML 6

GeneSpring GX Expression Data Formats Continue the tradition of support for multiple vendors Agilent - FE V 8.5 and newer (1 and 2 color) Affymetrix - 3 Expression Arrays: Command Console, GCOS (.CEL,.CHP) - Exon and Gene 1.0 ST Arrays: Command Console, GCOS Illumina - BeadStudio and GenomeStudio GenePix GenePix Pro 3.0/ results format V1.4 and newer ABI SDS, RQ Manager (for QPCR) Custom formats - Text files (1 and 2 color), except Imagene GeneSpring GX 11: New Technologies Affymetrix 100K-> 500K-> SNP v5.0 SNP v6.0 50K Xba, 50K Hind 250K Nsp, 250K Sty SNPv5.0 and SNPv6.0 arrays contain both CN and SNP probes, while the 100K and 500K arrays contain only the SNP probes Illumina: GenomeStudio outputs: HumanHap550 Human610-Quad Bead Chip Human 1M-Duo Human omni1-quad HumanCytoSNP-12 HumanCNV370-Quad HumanCVD 7

GeneSpring GX 11 Vocabulary Project primary workspace which contains a collection of experiments Experiment collection of samples that are analyzed as a set. Parameter variable in an experiment (Time, Treatment, Gender, etc.) Condition one or more samples that represent a common biological state (Ex. Time 14h) Interpretation Samples that are grouped together based on conditions. Entity a discrete feature measured by microarray analysis such as a probe or probeset Technology A file package containing information on array design and biological information (annotation) for all the entities on the array Biological Genome a collective set of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations GeneSpring GX 10/11: Interface 8

General Microarray Analysis Workflow Define Biological Question Design Experiment Select Array Technology Select Labelling Technology Perform Array Study Load array data Pre-process Raw Data Normalise processed data QC samples QC entities Perform statistical tests on relevant questions Clustering Annotation Assess biological context Independent Validation of statistically derived predictions Performed in GeneSpring GX Find Differentially Expressed Genes Affymetrix Files 9

Background of Case Study Congestive heart failure (CHF) is a degenerative condition in which the heart no longer functions effectively as a pump. The most common cause of CHF is damage to the heart muscle by not enough oxygen. This is usually due to narrowing of the coronary arteries which take blood to the heart. Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause. Ischemic cardiomyopathy is caused by a lack of oxygen to the heart due to coronary artery disease. Experimental Goal To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non-failing heart conditions. 10

Experiment: Collection of Samples Analyzed as a set 2 experimental parameters: Gender and CHF Etiology 1-color platform 12 total samples (2 biological replicates per Gender/CHF Etiology condition) Technology: Affymetrix HG U133 Plus 2 Data files are CEL files generated by Affymetrix GeneChip Operating Software (GCOS) Experimental Setup in GeneSpring Gender Interpretation SAMPLE GENDER CHF ETIOLOGY 1 Female Idiopathic 2 Female Idiopathic 3 Male Idiopathic 4 Male Idiopathic 5 Female Ischemic 6 Female Ischemic 7 Male Ischemic 8 Male Ischemic 9 Female Non-failing 10 Female Non-failing 11 Male Non-failing 12 Male Non-failing Condition 1: Female (Samples 1, 2, 5, 6, 9, 10) Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 ) CHF Etiology Interpretation Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8) Condition 3: Non-failing (Samples 9, 10, 11, 12) Gender/CHF Etiology Interpretation Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4) Condition 3: Female/Ischemic (Samples 5, 6) Condition 4: Male/Ischemic (Samples 7, 8) Condition 5: Female/Non-failing (Samples 9, 10) Condition 6: Male/Non-failing (Samples 11, 12) 11

Class-only: Creating technology file Normally, if a technology file is not found, GeneSpring will prompt to see if you would like to download the technology file from the Agilent Server. The process works well if you are connected to the Internet, but not so well in a classroom setting. Automatic Download of Technology An experiment comprises samples which all belong to the same technology. A technology is the array design and the associated biological annotations, such as Affymetrix.GeneChip.HG-U133_Plus_2. A technology initially must be installed for each new array type to be analyzed. For standard arrays from Affymetrix, Agilent and Illumina, technologies can be automatically downloaded from the Agilent server. For custom and catalogue arrays from Agilent, technologies can be automatically created from earray 12

Getting Started How do you begin in GX 11? Create or open a project Create a project 13

Getting Started Within a project, create an experiment and specify data format Once you select the type of data you have, you can proceed via 2 options for Workflow Type 1) Guided Workflow: Analysis steps are pre-determined and specific to selected data type 2) Advanced Analysis: Analysis steps and settings are selected by the user 14

Choose the data associated with the experiment Create experiment containing samples created from data files Create experiment from samples already in GeneSpring GX Baseline Transformation Options 15

Advanced Analysis Workflow Options 1) Experiment Setup - Specify parameters & interpretations 2) Quality Control - Sample & Entity Level QC 3) Analysis - Statistics and Fold Change - Additional Tools 4) Results Interpretation -Biological Contextualization 5) Utilities - Guided Workflow Advanced Workflow Experiment Setup Quick Start Guide Experiment Grouping Create Interpretation Create New Gene-level Experiment 16

Experiment Grouping The experimental parameters are added in this window. For each array, the particular parameter value (condition) is also specified. Values can be added manually or loaded from a saved file. Grouping and Interpretation For this experiment, three interpretations are created CHF etiology only, Gender only and CHF Etiology x Gender. 17

Interpretation Associated with Experiment Advanced Analysis Workflow Quality Control Quality Control on Samples Quality Control on Entities 18

Quality Control on Samples: Affymetrix data QC on Samples tool utilizes vendor-specific quality control metrics In the Guided Workflow, the following tools are available to evaluate the quality of arrays: 1) 3 /5 ratio 2) Hybridization control plots 3) Principal Components Analysis on Samples Quality Control on Samples All displays within window are linked- selecting sample in one will select same sample in all other displays Selected sample can be removed from experiment by clicking on Add/Remove button If sample is removed, remaining samples will be renormalized 19

Quality Control on Samples Internal Controls: 3 /5 ratios Premise: This is a measure of the efficiency of the cdna synthesis reaction. All Affymetrix arrays contain probes for the regions corresponding to 3, middle and 5 -end of housekeeping genes such as GAPDH and b- Actin. The ratio of signal intensity for 3 probesets to that from 5 probesets provides a measure of the number of cdna synthesis reactions that went to completion (i.e. full-length cdna is synthesized). Quality Control on Samples Internal Controls: 3 /5 ratios Interpretation of Results: The expectation is that the ratio for the probe sets is close to 1. A ratio > 3 indicates that either the starting RNA was degraded or that there was a problem with the cdna synthesis reaction. In GeneSpring, ratio values greater than 3 will be colored red. 20

Quality Control on Samples Hybridization Control Plots Premise: Pre-mixed hybridization control transcripts in known staggered concentrations are added to the hyb mix. Hybridization controls are composed of a mixture of biotin-labelled crna transcripts of biob, bioc, biod, and cre prepared in staggered concentrations. These controls allow you to monitor the hybridization and washing process. The signal intensity of these controls should increase with the concentrations. Deviations from the expected intensity profile of these controls indicates a potential problem with the hyb or washing process. Quality Control on Samples Hybridization Control Plots Interpretation of Results: Each profile represents the signal intensities of the hybridization control probes in each sample. We want to see that the profiles across all samples are similar and that within each sample, the profiles reflect the variable concentrations of the probes. 21

By default, each sample is plotted according to its values for the first three Principal Components Principal Components are vectors that capture the most variance in the data. Assumption: samples within an experimental condition should be more similar to each other than to those from different conditions. Expect to see samples from the same experimental condition to group closer to each other than to samples of a different condition Quality Control on Samples Principal Components Analysis PCA Is a Variable Reduction Method PC 2 PC 1 An eigenvalue-eigenvector decomposition is performed on the covariance matrix of the gene expression values around zero The eigenvector corresponding to the largest eigenvalue is called the first principal component Successive principal components are eigenvectors corresponding to each smaller eigenvalue 22

Quality Control on Entities Filter Probesets by Expression Entities can be removed from the experiment based on their signal intensity values. Quality Control on Entities Filter Probesets by Flags By default, Entity List currently selected in Navigator is selected as input for analysis. Users can adjust the stringency of the filter by specifying the type of flag call and the number of samples 23

Navigator Hierarchy Within an experiment, there is an Analysis folder containing all data objects created for the experiment. Data objects (lists, trees, classifications) within an experiment are saved under the input Entity List used for analysis. Analysis Statistical analysis (how signficant are the differences) and fold-change (how much up- or down-regulated) are independent tests. Statistical tests provided in pull-down list. Only tests valid for the interpretation are listed. Appropriateness of test determined by experiment setup: number of parameters, number of conditions, and number of replicates. Statistics requires replicates; fold-change may be calculated without replicates. Fold-change calculated in pairs. Condition 2 is the baseline condition. 24

Significance Analysis (Gender x Etiology) For this experiment, two parameters, tissue and treatment, are part of the design. Thus, GeneSpring automatically applies the 2-way ANOVA, which tests for the effects of 2 parameters The 2-way ANOVA performs 3 separate tests Generate p-value for effect of etiology Generate p-value for effect of gender Generate p-value for effect of interaction between etiology and gender (change in expression influenced by both parameters) The 3 resulting entity lists are displayed in a Venn Diagram. You are asked about pairs of conditions for FOLD CHANGE calculations in Step 7/9 because fold changes are calculated automatically if there are replicates. Significance (Gender x Etiology) 25

Significance Analysis (Gender only) For this interpretation, there is one parameter and two conditions, so a t-test is appropriate and is the default statistical analysis. The results of a significance analysis will be a volcano plot that displays the results as a plot of p-value vs. fold change. As you saw in the previous result (2- way ANOVA), gender has no effect so this volcano plot is from another experiment. Significance Analysis (Etiology) With one parameter and three conditions, the ANOVA test is selected. Without a post-hoc test, results are displayed as a table. With a post-hoc test, results displayed as a chart with selectable cells. Step 7/9 asks for pairs for fold-change calculation if replicates exist. Post-hoc test; can union and intersect cells using boxes below. 26

Filter on Volcano Plot Used to compare two groups (like t-test). P-value and fold-change cut-off may be changed independently of each other. Output is a volcano plot with two green lines (to show p- value and fold-change filters). P-value cut-off Fold-change cut-off Fold Change Independent of signficance (p-value) analysis. Calculated for pairs of conditions. Output shown in table and graphical format. Absolute values of Fold Change shown (with up or down). 27

Updating Annotations in GeneSpring GX Annotations required for biological contextualization tools and genome browser. Group/Presentation Title Agilent Restricted Month ##, 200X Updating Annotations Option 1: Update from Agilent Server Option 2: Update from Agilent earray 28

Updating Annotations Option 4: Update from Biological Genome Option 3: Update from file What is a biological genome in GeneSpring GX? Think of it as a super technology that contains annotations for the genes of a particular organism Annotations are from NCBI and thus are not vendor-specific and are not chipdependent. Annotations include common name, gene symbol, gene product description, GO IDs, Chromosomal locations, exon information, mirna information, and many more Updating technology from Biological Genome may bring in more annotation than what is provided in original technology This allows us to relate an entity to any other entity (Agilent probe to Affymetrix probe, mirna to its target genes) 29

Thank you! 30