CASE STUDY OF DATA WAREHOUSE DEVELOPMENT FOR MONITORING OF ENERGY CONSUMPTION IN PUBLIC BUILDINGS

Podobni dokumenti
Društvo za elektronske športe - spid.si Vaneča 69a 9201 Puconci Pravila tekmovanja na EPICENTER LAN 12 Hearthstone Na dogodku izvaja: Blaž Oršoš Datum

Športno društvo Jesenice, Ledarska 4, 4270 Jesenice, Tel.: (04) , Fax: (04) , Drsalni klub Jesenice in Zv

PRESENT SIMPLE TENSE The sun gives us light. The sun does not give us light. Does It give us light? Raba: Za splošno znane resnice. I watch TV sometim

Microsoft Word - ARRS-MS-BR-07-A-2009.doc

Microsoft Word - M docx

Microsoft Word - ARRS-MS-CEA-03-A-2009.doc

Uradni list Republike Slovenije Št. 39 / / Stran 6173 EVROPSKA ŠOLA:... Učenec:... Datum rojstva:... Letnik:... Razrednik:... ŠOLSKO POROČI

ARRS-BI-FR-PROTEUS-JR-Prijava/2011 Stran 1 od 7 Oznaka prijave: Javni razpis za sofinanciranje znanstvenoraziskovalnega sodelovanja med Republiko Slov

Microsoft Word - ARRS-MS-FI-06-A-2010.doc

ZAHTEVA ZA VZDRŽEVANJE LEI (sklad) REQUEST FOR A MAINTENANCE OF LEI (fund) 1. PODATKI O SKLADU / FUND DATA: LEI: Ime / Legal Name: Druga imena sklada

Preštudirati je potrebno: Floyd, Principles of Electric Circuits Pri posameznih poglavjih so označene naloge, ki bi jih bilo smiselno rešiti. Bolj pom

UČNI NAČRT PREDMETA / COURSE SYLLABUS Predmet: Podatkovne baze 1 Course title: Data bases 1 Študijski program in stopnja Study programme and level Vis

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Informatizacija malih podjetij Informatisation of Small Companies Študijski program in st

PAST CONTINUOUS Past continuous uporabljamo, ko želimo opisati dogodke, ki so se dogajali v preteklosti. Dogodki so se zaključili v preteklosti in nič

Microsoft Word - P101-A doc

ORGANIZATOR: SMUČARSKA ZVEZA SLOVENIJE NC GS 2016 GOLTE FIS-GS MOŠKI/ŽENSKE RAZPIS TEKMOVANJA Tehnični delegate: Tehnični podatki Smučarski center Gol

2_Novosti na področju zakonodaje

Microsoft Word Cimperman Blaž, Metronik, Upravljanje z energijo in odpadki v kompleksnem industrijskem obratu

Microsoft PowerPoint - Umanotera ppt [Read-Only] [Compatibility Mode]

Strojni{ki vestnik 48(2002)10, Journal of Mechanical Engineering 48(2002)10, ISSN ISSN UDK /.09: UDC

SKUPNE EU PRIJAVE PROJEKTOV RAZISKOVALNE SFERE IN GOSPODARSTVA Maribor, Inovacije v MSP Innovation in SMEs dr. Igor Milek, SME NKO SPIRIT S

P183A22112

Workhealth II

UČNI NAČRT PREDMETA / COURSE SYLLABUS Predmet: Podatkovne baze 2 Course title: Data bases 2 Študijski program in stopnja Study programme and level Vis

Microsoft Word - Delovni list.doc

PRILOGA 1: SODELOVANJE NA JAVNEM NAROČILU - ENOSTAVNI POSTOPEK ANNEX 1: PARTICIPATION IN THE TENDER SIMPLIFIED PROCEDURE 1. OPIS PREDMETA JAVNEGA NARO

Slide 1

PowerPoint Presentation

Microsoft Word - si-6 Uporaba informacijsko-komunikacijske tehnologije IKT v gospodinjstvih 1 cetrt 05.doc

untitled

ROSEE_projekt_Kolesarji

Microsoft PowerPoint - Predstavitev SC5_ Luka Živić.ppt

Microsoft Word - A-3-Dezelak-SLO.doc

16 TABAK.cdr

Aleš Štempihar Agile in IIBA poslovni analitiki dodana vrednost za organizacijo in njene kupce Povzetek: Kaj je pravzaprav Agile? Je to metodologija z

Slide 1

Slide 1

VISOKA ZDRAVSTVENA ŠOLA V CELJU DIPLOMSKO DELO VLOGA MEDICINSKE SESTRE PRI OBRAVNAVI OTROKA Z EPILEPSIJO HEALTH EDUCATION OF A NURSE WHEN TREATING A C

Microsoft Exchange 2013

Event name or presentation title

PROJEKTNO VODENJ IN GDPR ANTON PEVEC, ŠPELA URH POPOVIČ, MAJA FERLE, VESNA KOBAL

SPREMEMBE

PowerPoint Presentation

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Sodobne metode razvoja programske opreme Modern software development methods Študijski pr

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Uporabniška izkušnja User Experience Študijski program in stopnja Study programme and lev

ZSL Transfuzijska dejavnost

SEZNAM STANDARDOV Zap. št. Oznaka standarda 1. SIST EN 50162: SIST-TS CLC/TS : SIST EN 50129: SIST-TP CLC/TR :2007

OBZORJE 2020 Marie Sklodowska-Curie Actions (MSCA)

Predmet: Course title: UČNI NAČRT PREDMETA/COURSE SYLLABUS Matematična fizika II Mathematical Physics II Študijski programi in stopnja Študijska smer

UNIVERZA V LJUBLJANI FAKULTETA ZA RAČUNALNIŠTVO IN INFORMATIKO TINE BOROVNIK SODOBNI PRISTOPI K NAČRTOVANJU PODATKOVNIH SKLADIŠČ MAGISTRSKO DELO Mento

Srednja poklicna in strokovna šola Bežigrad - Ljubljana Ptujska ulica 6, 1000 Ljubljana STATISTIKA REGISTRIRANIH VOZIL V REPUBLIKI SLOVENIJI PROJEKTNA

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Uporabniška izkušnja User Experience Študijski program in stopnja Study programme and lev

Sprememba obsega pogodbe o vzpostavitvi in vzdrževanju akreditacije

NEŽA MOČNIK STANDARD O DNEVNI SVETLOBI V STAVBAH SIST EN 17037:2019

APERION TECH RIDER (ionosfera) ODER page 2 VARNOST, DOSTOP, PARKING page 2 ELEKTRIKA page 2 OZVOČENJE page 2 REŽIJA page 2 MONITORING page 2 TONSKA VA

U N I V E R Z A V L J U B L J A N I M E D I C I N S K A F A K U L T E T A C E N T R A L N A M E D I C I N S K A K N J I Ž N I C A POROČ ILO O DELU ČMK

PowerPoint Presentation

Obzorje 2020 KI SI, Ljubljana 2. februar 2016 Hitra pot do inovacij Fast Track to Innovation Pilot (FTI Pilot) dr. Igor Milek, SME NKO SPIRIT Slovenij

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Poučevanje algoritmičnega razmišljanja Teaching algorithmic thinking Študijski program in

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Računalniški sistemi Computer systems Študijski program in stopnja Study programme and le

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS (leto / year 2016/17) Računalniške storitve v oblaku Cloud computing Študijski program in

PowerPointova predstavitev

Slovenska predloga za KE

(Microsoft PowerPoint - Milan Ojster\232ek_IJU2014)

Microsoft PowerPoint - Sestanek zastopniki_splet.ppt

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Menedžment projektov Management of Projects Študijski program in stopnja Study programme

Microsoft Word - Pr08VOKA_Celje_Klanc.doc

Šolski center Celje Srednja šola za kemijo, elektrotehniko in računalništvo ELEKTRONSKA REDOVALNICA RAZISKOVALNA NALOGA AVTORJI Aleš Budna Jure Ulaga

EN : EN 13813: 2002 IZJAVA O LASTNOSTIH Sikafloor-20 PurCem EN 13813: Tip izdelka: Enotna identifikaci

Microsoft Word - GB-PSP-2006.doc

PH in NEH - dobra praksa

Datum in kraj

Diapozitiv 1

Daniel Adanza Dopazo MACHINE LEARNING ON BIG DATA USING MONGODB, R AND HADOOP Master thesis Maribor, December 2016

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS Računalniške storitve v oblaku Cloud Computing Študijski program in stopnja Study program

Naslov

PREDLOG ZA AKREDITACIJO

Digiars_Adobe_cenik_maj2012_AVLKOM.xls

Microsoft Word - D&O ZNS Vprasalnik.doc

ČLANI SKUPINE: Zasedbo Linkin Park sestavlja šest, po njihovem mnenju dolgočasnih, ljudi: Vokalist Chester Bobnar Rob Vokalist Mike Basist Pheonix DJ

Microsoft Word - Pr08VOKA_Celje_Dobrna.doc

Osnovna šola dr. Jožeta Pučnika Osnovna Črešnjevec 47, 2130 Slovenska Bistrica Tel:(02) ; Fax: (02) www.

Uradni list RS - 104/2003, Uredbeni del

Predmet: Course title: UČNI NAČRT PREDMETA / COURSE SYLLABUS (leto / year 2017/18) Planiranje in upravljanje informatike Informatics planning and mana

Transkripcija:

UNIVERZA V MARIBORU FAKULTETA ZA ELEKTROTEHNIKO, RAČUNALNIŠTVO IN INFORMATIKO Gran Kvačić CASE STUDY OF DATA WAREHOUSE DEVELOPMENT FOR MONITORING OF ENERGY CONSUMPTION IN PUBLIC BUILDINGS Magistrsk del Maribr, junij 2016

UNIVERZA V MARIBORU FAKULTETA ZA ELEKTROTEHNIKO, RAČUNALNIŠTVO IN INFORMATIKO Gran Kvačić CASE STUDY OF DATA WAREHOUSE DEVELOPMENT FOR MONITORING OF ENERGY CONSUMPTION IN PUBLIC BUILDINGS Magistrsk del Maribr, junij 2016

CASE STUDY OF DATA WAREHOUSE DEVELOPMENT FOR MONITORING OF ENERGY CONSUMPTION IN PUBLIC BUILDINGS Študija primera razvja pdatkvnega skladišča za spremljanje prabe energije v javnih stavbah Magistrsk del Student: Study prgramme: Mentr: Editing: Gran Kvačić 2nd Blgna degree a Master's Prgramme Infrmatics and Technlgies f Cmmunicatin Assciate prfessr Dr. Bštjan Brumen Martina Mavrek, prfessr f English Language

ACKNOWLEDGEMENTS I wuld like t thank my mentr Dr. Bštjan Brumen fr his guidance and help in writing this thesis. I als thank my clleagues Dr. Vladimir Špišić and Krnelije Petak fr their advices and supprt. Special thanks ges t my family fr their cnstant supprt and patience during all the years f studies.

Študija primera razvja pdatkvnega skladišča za spremljanje prabe energije v javnih stavbah Ključne besede: pdatkvn skladišče, dimenzijski mdel, zvezdna shema, snežinkasta shema, ETL prces UDK: 004.658(043.2) Pvzetek Sestavni del Energetskega infrmacijskega sistema (EIS), ki je bil razvit za kupca, je bsežen in dinamičen sistem prčanja. Ustvarjanje kmpleksnih prčil ima velik vpliv na zmgljivst pslvne (transakcijske) pdatkvne baze. Da bi se izgnili bremenitvam nje, je bila uvedba pdatkvnega skladišča lgičen krak. Pdatkvn skladišče je vrsta baze pdatkv, ustvarjena s ciljem priprave pdatkv za analiz in prčanje. P pregledu literature na pdrčju skladiščenja pdatkv je bil naslednji krak v prcesu implementacije pdatkvnega skladišča ceniti ptrebe in ugtviti, kateri pristp razvja izbrati. P ceni zahtev, značilnsti sistema in tehnlgij ter sebja, ki sm jih imeli na razplag, je bila sprejeta dlčitev, da bm uprabili pristp, katerega zagvarja Ralph Kimball. Argumenti za t dlčitev s pdani v delu. Psledica Kimballvega pristpa je dimenzinalni mdel pdatkvnega skladišča. Dimenzinalni mdel pdatkvnega skladišča je lahk zasnvan na shemi v bliki zvezde ali snežinke. P psvetvanju z dkumentacij in sprejetjem mnenja članv naše skupine, sm se dlčili, da bm uprabljali zvezdn shem. Vendar nam je inženirska radvednst dala idej, da izvedem študij primera, v kateri bi primerjali izvajanje ETL (Extract- Transfrm-Lad) prcesa za implementacij pdatkvnega skladišča, zasnvanega na beh shemi mdela. V kviru študije primera sm implementirali dve različici pdatkvnega skladišča, en zasnvan na pdlagi mdela z zvezdast shem in drug na pdlagi mdela sheme v bliki snežinke. v

Cilj te raziskave je bil ugtviti, katera izmed implementacij pdatkvnega skladišča b prinesla bljše rezultate v zvezi izvedbe prcesa ETL. ETL je prces zajemanja, preblikvanja in prensa pdatkv iz različnih virv v pdatkvn skladišče. P implementaciji sm pravili vrst testv s ciljem cenitve beh različic pdatkvnega skladišča. Merili sm trajanje ETL prcesa in velikst pdatkvne baze pdatkvnega skladišča za različne veliksti perativne baze pdatkv. Statistična analiza pridbljenih pdatkv nam je mgčila pdajanje dgvrv na naša raziskvalna vprašanja: Raziskvaln vprašanje 1: Ali bstaja razlika v trajanju ETL prcesa za različne veliksti perativne pdatkvne baze, če primerjam mdele pdatkvnega skladišča v bliki zvezde in snežinke? Raziskvaln vprašanje 2: Ali bstaja razlika v veliksti pdatkvnega skladišča med implementacij na pdlagi sheme v bliki zvezde in snežinke? Raziskvaln vprašanje 3: Kak je trajanje ETL prcesa dvisn d kličine pdatkv v perativni pdatkvni bazi? Raziskvaln vprašanje 4: Kak je velikst pdatkvnega skladišča dvisna d kličine pdatkv v perativni pdatkvni bazi? Rezultati statistične analize s pkazali, da je za pazvane scenarije implementacija, ki temelji na shemi snežinke, blj uspešna - tak v krajšem času trajanja ETL prcesa in manjši veliksti pdatkvnega skladišča. Pleg tega sm ugtvili, da se dvisnst med velikstj perativne baze pdatkv in trajanja ETL prcesa za be izvedbi lahk enak dbr pišej z linearnim in mčnstnim regresijskim mdelm. vi

Case study f data warehuse develpment fr mnitring f energy cnsumptin in public buildings Key wrds: data warehuse, dimensinal mdel, star schema, snwflake schema, ETL prcess UDK: 004.658(043.2) Abstract The gal f this case study was t examine which implementatin f the data warehuse will yield better results in the bserved scenari mnitring f energy cnsumptin in public buildings. Data warehuse (DW) is a type f database created with the gal f preparing data fr analysis and reprting. We implemented tw versins f DW, ne based n the star and the ther n the snwflake schema mdel. Series f tests were cnducted t evaluate implemented slutins. Statistical analysis shwed that fr the bserved scenaris, implementatin based n snwflake schema perfrms better, in bth shrter ETL executin time and smaller size f DW. vii

TABLE OF CONTENTS 1 rductin... 1 1.1 Dmain... 1 1.2 The purpse, gals and basic arguments... 2 1.2.1 Research questins and hyptheses... 2 1.3 Assumptins and limitatins f the research... 6 1.4 Research methds... 6 1.5 Requirements... 7 1.6 Cntent descriptin... 7 2 Data warehuse... 9 2.1 Data warehuse develpment appraches...11 2.2 Cmpnents f a data warehuse...12 2.2.1 Data surces...12 2.2.2 Data staging area...13 2.2.3 Data presentatin...13 2.2.4 Data access...14 2.3 Dimensinal data mdel...14 3 Case study...15 3.1 Chsing the apprach...15 3.2 Data warehuse develpment...17 3.2.1 Surce database...17 3.2.2 Dimensinal mdel design...18 3.2.3 Dimensinal mdel implementatin...32 3.2.4 ETL prcess implementatin...34 3.3 Testing...36 3.4 Results...37 4 Hyptheses evaluatin and cnclusin...41 viii

4.1 RQ1: Is there a difference in the duratin f ETL fr a specific peratinal database size when cmparing star and snwflake schema mdel?...41 4.2 RQ2: Is there a difference in the size f DW between the star and snwflake schema mdel fr specific peratinal database size?...45 4.3 RQ3 Hw des the duratin f the ETL prcess depend n the amunt f data in the peratinal database?...45 4.3.1 Regressin analysis fr the linear mdel...46 4.3.2 Regressin analysis fr the pwer mdel...48 4.3.3 Regressin analysis fr the expnential mdel...49 4.4 RQ4 Hw des DW size depend n the amunt f data in the peratinal database...52 5 Cnclusins...54 Literature...55 Apendix A...56 ix

LIST OF ABBREVIATIONS DW ETL OLAP OLTP SQL EF POCO Data Warehuse Extract-Transfrm-Lad On-Line Analytical Prcessing On-Line Transactin Prcessing Structured Query Language Entity Framewrk Plain Old CLR Object x

LIST OF IMAGES Image 3.1: EIS peratinal database mdel...17 Image 3.2: Mdel f entity's cnsumptin fact table with its dimensins in star schema...21 Image 3.3: Mdel f custmer's cnsumptin fact table with its dimensins in star schema...22 Image 3.4: Mdel f entity's cnsumptin fact table with its dimensins in the snwflake schema...23 Image 3.5: Mdel f custmer's cnsumptin fact table with its dimensins in the snwflake schema...24 Image 3.6: DimensinDate POCO class in C# prgraming language...33 Image 3.7: DimensinDate table in DW...34 Image 4.1: The average duratin f the ETL prcess fr bth implementatins...46 Image 4.2: Size f DW...52 xi

LIST OF TABLES Table 2.1 Cmparisn f the essential features f Inmn's and Kimball's mdels...11 Table 3.1: Specific characteristics in favur f Inmn's r Kimball's mdel...16 Table 3.2: Attributes f dimensin DimensinEntity fr star schema...25 Table 3.3: Attributes f dimensin DimensinCustmer fr star schema...26 Table 3.4: Attributes f dimensin DimensinEnergySurce fr star schema...26 Table 3.5: Attributes f dimensin DimensinEnergySurce fr star schema...26 Table 3.6: Attributes f dimensin DimensinEnergySurceCnsumptin fr star schema...27 Table 3.7: Attributes f dimensin DimensinMeasurementPint fr star schema...27 Table 3.8: Attributes f dimensin DimensinServiceItem fr star schema...27 Table 3.9: Attributes f dimensin DimensinBill fr star schema...27 Table 3.10: Attributes f dimensin DimensinBill fr snwflake schema...28 Table 3.11: Attributes f dimensin DimensinCustmer fr snwflake schema...28 Table 3.12: Attributes f dimensin DimensinCustmerSubtype fr snwflake schema.28 Table 3.13: Attributes f dimensin DimensinCustmerType fr snwflake schema...28 Table 3.14: Attributes f dimensin DimensinDate fr snwflake schema...29 Table 3.15: Attributes f dimensin DimensinEnergyEfficiencyClass fr snwflake schema...29 Table 3.16: Attributes f dimensin DimensinEnergySurce fr snwflake schema...29 Table 3.17: Attributes f dimensin DimensinEnergySurceCnsumptin fr snwflake schema...30 Table 3.18: Attributes f dimensin DimensinEntity fr snwflake schema...30 Table 3.19: Attributes f dimensin DimensinEntityDetails fr snwflake schema...31 Table 3.20: Attributes f dimensin DimensinEntityType fr snwflake schema...31 Table 3.21: Attributes f dimensin DimensinMeasurementPint fr snwflake schema31 Table 3.22: Attributes f dimensin DimensinServiceItem fr snwflake schema...31 Table 3.23: Data sets used in tests...36 Table 3.24: Test result fr the star schema mdel ETL prcess duratin in secnds...38 Table 3.25: Test result fr the snwflake schema mdel ETL prcess duratin in secnds...39 Table 3.26: The size f star and snwflake schema DW...40 Table 4.1: Duratin difference, in secnds, between bth implementatins...41 Table 4.2: Results f the Shapir-Wilk test f nrmality fr distributin f duratin difference...42 xii

Table 4.3: Results f the T-test n the prcess duratin difference...44 Table 4.4: The average duratin and difference f the ETL prcess between bth implementatins...44 Table 4.5: Results f the linear regressin analysis...47 Table 4.6: Relusts f the pwer regressin analysis...49 Table 4.7: Relusts f the expnential regressin analysis...50 Table 4.8: The cefficient f determinatin fr linear, pwer and expnential regressin mdel...51 Table 4.9: Results f the regressin analysis fr dependency between peratinal database size and DW size...53 xiii

1 INTRODUCTION Infrmatin is an asset t any rganizatin. Tday, almst every rganizatin uses database management systems t increase the value f their data. The crprate decisinmakers require access t all f the rganizatin s data at any level but as the amunt f the data increases, it becmes harder t access it, because it may be in different frmats, n different platfrms, and resides in different structures. Organizatins have t write and maintain several prgrams t cnslidate data fr analysis and reprting. This prcess is cstly, inefficient and time cnsuming fr an rganizatin. Traditinal database systems, called peratinal r transactinal, d nt satisfy the requirements fr data analysis f the decisin-making users. An peratinal database supprts daily business peratins and the primary cncern f such a database is t ensure cncurrent access and recvery techniques that guarantee data cnsistency. Operatinal databases cntain detailed data and ften d nt include histrical data. Since they are usually highly nrmalized 1, they perfrm prly fr cmplex queries that need t jin many relatinal tables r t aggregate large vlumes f data. Data warehusing prvides an excellent apprach in transfrming peratinal data int useful and reliable infrmatin t supprt the decisin making prcess. It als prvides the basis fr data analysis techniques like data mining and multidimensinal analysis. Accrding t W.H. Inmn, Data Warehusing (DW) is a subject-riented, integrated, timevariant, and nn-vlatile cllectin f data in supprt f the management s decisin making prcess. Data warehusing prcess cntains extractin f data frm hetergeneus data surces, cleaning, filtering and transfrming data int a cmmn structure and string data in a structure that can be easily accessed and used fr reprting and analysis purpses [8]. Mre detailed discussin f DW fllws in chapter 2. 1.1 Dmain Grwing energy use raises cncerns ver supply difficulties, exhaustin f energy resurces and envirnmental impact. Energy cnsumptin is usually split int three main sectrs: industry, transprt and ther, including agriculture, service sectr and residential. Althugh this makes it cnsiderably difficult t gather infrmatin abut energy cnsumptin 1 Database nrmalizatin - The prcess f rganizing the attributes and tables f a relatinal database t minimize data redundancy. Nrmalized database cmplies with nrmal frms. 1

generated by buildings, it is estimated that they accunt fr 20-40% f the ttal energy cnsumptin [9]. T efficiently manage requirements fr energy and reduce building energy cnsumptin, a quality insight int energy cnsumptin is required. T imprve energy cnsumptin management and reduce bth envirnmental impact and the financial cst f public building expenditures, the custmer has decided t invest in a slutin fr mnitring energy cnsumptin. Their request was t have the ability t mnitr and analyse energy cnsumptin by bringing tgether the data acquired frm autmated energy cnsumptin measurement infrastructure and the data received frm energy suppliers. The gal f sftware slutin and its data mnitring and analysis abilities is t prvide quality data n which the decisin n energy savings measures can be made. 1.2 The purpse, gals and basic arguments The integral part f the energy cnsumptin mnitring slutin (EIS 2 ) is data warehuse. Data warehuse prvides an excellent apprach in transfrming peratinal data int useful and reliable infrmatin t supprt the decisin making prcess. It als prvides the basis fr data analysis techniques like data mining and multidimensinal analysis. The gal f this thesis is t examine which design schema will yield better results in real wrld scenaris regarding the perfrmance f imprting the data int the DW (perfrming the Extract-Transfrm-Lad prcess). We can break dwn the main gal int these sub gals: Design and implementatin f data warehuses based n tw different design schemas (star mdel and snwflake mdel) Executin f perfrmance tests Evaluatin f the results 1.2.1 Research questins and hyptheses Fllwing research questins and hyptheses were frmulated and will be accepted r rejected, depending n the test results: 2 EIS Energetski infrmacijski sustav (Cratian) Energy infrmatin system (English) 2

RQ1 Is there a difference in the duratin f the ETL prcess fr a specific peratinal database size when cmparing the star and snwflake schema mdel? H1 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 100.000 bill items. H1 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 100.000 bill items. H2 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 200.000 bill items. H2 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 200.000 bill items. H3 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 400.000 bill items. H3 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 400.000 bill items. H4 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 800.000 bill items. H4 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 800.000 bill items. 3

H5 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 1.600.000 bill items. H5 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 1.600.000 bill items. H6 0 There is n difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 3.200.000 bill items. H6 a There is a difference in the duratin f the ETL prcess between the star and snwflake schema mdel fr the peratinal database cntaining 3.200.000 bill items. RQ2 Is there a difference in the size f DW between the star and snwflake schema mdel fr specific peratinal database size? RQ3 Hw des the duratin f the ETL prcess depend n the amunt f data in the peratinal database? H7 0 Duratin f the ETL prcess fr DW based n the star schema mdel is linearly dependent n the amunt f data in the peratinal database. H7 a Duratin f the ETL prcess fr DW based n the star schema mdel is nt linearly dependent n the amunt f data in the peratinal database. H8 0 Duratin f the ETL prcess fr DW based n the snwflake schema mdel is linearly dependent n the amunt f data in the peratinal database. H8 a Duratin f the ETL prcess fr DW based n the snwflake schema mdel is nt linearly dependent n the amunt f data in the peratinal database. 4

H9 0 Duratin f the ETL prcess fr DW based n the star schema mdel is pwer dependent n the amunt f data in the peratinal database. H9 a Duratin f the ETL prcess fr DW based n the star schema mdel is nt pwer dependent n the amunt f data in the peratinal database. H10 0 Duratin f the ETL prcess fr DW based n the snwflake schema mdel is pwer dependent n the amunt f data in the peratinal database. H10 a Duratin f the ETL prcess fr DW based n the snwflake schema mdel is nt pwer dependent n the amunt f data in the peratinal database. H11 0 Duratin f the ETL prcess fr DW based n the star schema mdel is expnentially dependent n the amunt f data in the peratinal database. H11 a Duratin f the ETL prcess fr DW based n the star schema mdel is nt expnentially dependent n the amunt f data in the peratinal database. H12 0 Duratin f the ETL prcess fr DW based n the snwflake schema mdel is expnentially dependent n the amunt f data in the peratinal database. H12 a Duratin f ETL the prcess fr DW based n the snwflake schema mdel is nt expnentially dependent n the amunt f data in the peratinal database. RQ4 Hw des DW size depend n the amunt f data in the peratinal database? H10 0 Data warehuse size is linearly dependent n the amunt f data in the peratinal database. H10 a Data warehuse size is nt linearly dependent n the amunt f data in the peratinal database. 5

H11 0 Data warehuse size duratin is expnentially dependent n the amunt f data in the peratinal database. H11 a Data warehuse size duratin is nt expnentially dependent n the amunt f data in the peratinal database. H12 0 Data warehuse size is pwer dependent n the amunt f data in the peratinal database. H12 a Data warehuse size is nt pwer dependent n the amunt f data in the peratinal database. 1.3 Assumptins and limitatins f the research The research is based n the fllwing assumptins: 1. Operatinal database f EIS is and will remain the nly surce database Limitatins f the research: 1. Research was dne withut any previus experience in data warehusing 2. Data warehuses n bth mdel schemas was develped using the same technlgy and sftware framewrks. The research ignred the influences f thse technlgies and framewrks n perfrmance. 3. Data warehuse was implemented using fllwing technlgies: a. C#/.NET prgramming language b. Entity Framewrk bject-relatinal mapping framewrk c. MS SQL database 4. The data mdel n which DW is based was defined in advance. 5. Test data made available fr this study cntained bills and related data fr the timespan f 6 years, frm 2010 t 2015, cntaining 173.914 bills with 3.216.855 bill items. 1.4 Research methds T evaluate DW Extract-Transfrm-Lad prcess perfrmance, a case study research was cnducted. 6

Data acquired by cnducting an experiment testing ETL prcess executin time was evaluated by using statistical methds. Hyptheses were tested using fllwing statistical methds: 1. RQ1: The Shapir-Wilk test fr nrmality fllwed by the T-test r Wilcxn test, depending n nrmality test results. 2. RQ2: Basic statistics. 3. RQ3: Regressin analysis 4. RQ4: Regressin analysis. 1.5 Requirements The custmer rdered a sftware slutin which will allw them t mnitr the cnsumptin f energy in public buildings and which wuld suggest savings measures t cntribute t energy efficiency in each building. They wuld like t mnitr the cnsumptin f electrical and thermal energy, natural gas, light fuel il, water and alike. As it regards the dmain, water is als cnsidered as an energy surce. The custmer wants t be able t perfrm a series f analyses and make reprts regarding energy cnsumptin. Because f the perfrmance issues, reprting system needs t be able t access the data required t create a reprt withut slwing dwn the peratinal database. Operatinal database hlds the live data n which users perfrm actins by using client side web applicatin. The reprting system shuld, n the ther hand, use a different database, s that the peratinal database desn t get slwed dwn by time and resurce demanding queries executed by the reprting system. As the respnse t this requirement, DW emerged as the ptential slutin. 1.6 Cntent descriptin rductin t the dmain and gals f this case study are given in the first chapter. We defined the assumptins and limitatins f the research, as well as the gals, research methds and requirements f the research. The secnd chapter summarizes the existing literature n DWs and intrduces the DW develpment appraches. 7

In the third chapter, we described the cnducted case study. This chapter includes the descriptin f the implementatin prcess and test prcedure, and at the end, presents us with the test results. The furth chapter cntains the evaluatin f the test results and presents the cnclusins f this thesis. We cnclude by examining pssible areas fr further study. 8

2 DATA WAREHOUSE Data warehuse technlgy was frmed as a respnse t business management and analysis needs. After the extractin f data frm transactinal systems t query-riented databases, users are able t analyse the data f warehuses in real time, withut affecting business peratins [12]. Data warehuse is a type f database created with the gal f bringing tgether selected data frm multiple hetergeneus databases and ther infrmatin surces. A DW literally warehuses infrmatin abut an rganizatin r prcess and allws the extractin f meaningful, cnsistent and accurate data fr analysis and decisin making. Data is extracted frm each f the surces, it is then filtered and transfrmed as needed, merged with data frm ther surces and then laded int DW. The prcess f extracting data frm surces, transfrming it accrding t the DW data mdel and lading it int the DW is called Extract-Transfrm-Lad (ETL) prcess. DW brings sme advantages ver the traditinal appraches t the integratin f multiple surces [13], which explains the grwing interest f the industry fr it: The queries can be answered withut accessing the riginal infrmatin surces (usually peratinal databases). In that way high query perfrmance can be btained fr cmplex aggregatin queries that are needed fr in-depth analysis, decisin supprt and data mining. On-Line Analytical Prcessing (OLAP) is decupled as much as pssible frm On- Line Transactin Prcessing (OLTP). Therefre, the data is highly available and there is n interference f OLAP with lcal prcessing at the peratinal surces [11]. DW systems must structure data t be intuitive fr decisin makers t interactively analyse them by means f different techniques, such as OLAP r data mining, thus meeting their infrmatin needs. As it is als in ur case, ne f the mst cmmn custmer s requests that leads t DW develpment is the need f extensive reprt generatin. When generating reprts, it is f central imprtance fr these systems t cmpute summaries f data in a simple and efficient way. In rder t d this, DWs rganize data accrding t the multidimensinal mdel. In the multidimensinal mdel, dimensins reflect the perspectives frm which facts are viewed. Facts crrespnd t events which are usually 9

assciated t numeric values knwn as measures, and are referenced using the dimensin elements [3]. Design methds f traditinal database design have clearly defined gals. Thse methds define the design prcess as a series f steps during which a cnceptual design phase is perfrmed. The results f this phase are transfrmed int a lgical mdel as the basis f schema implementatin. As mentined befre, these methds have clearly defined gals, such as minimality in resulting schemas, freedm frm redundancy, cmpleteness in regard f cverage f the underlying applicatin, etc. Sme f these requirements can be made precise, like understanding the reasns fr redundancies in database relatins prvided by the ratinal dependency thery. This thery als frmalizes nrmal frms and nrmalizatin as a way t avid them. There are even algrithmic appraches frms cnstructing nrmalized schemas [4, 10]. When this is cmpared t DW, there are at least tw imprtant differences [7]: First, a DW integrates the infrmatin prvided by a number f pre-existing surce. Secnd, DWs are used fr analysis and decisin making purpses, usually fr OLAP, where cmplex queries frequently cmpute aggregate values ver huge amunts f data, while users never initiate update transactins. Fr thse reasns, there is a difference in data access bserved n DWs and peratinal databases, where shrt transactins are mre cmmn, prviding high perfrmance and shrt respnse time [7]. Lechtenbörger and Vssen cnclude that, althugh it is generally agreed that DW design is a nntrivial task, hardly any frmal guidelines exist t date fr deriving a gd schema frm given data surces. As a result, there appears t be a cnsiderable discrepancy between traditinal database design as applied t peratinal databases, and the design principles that apply t DWs [7]. Accrding t R. Kimball we can islate fllwing DW requirements: 1. The DW must make an rganizatin s infrmatin easily accessible. 2. The DW must present the rganizatin s infrmatin cnsistently. 3. The DW must be adaptive and resilient t change. 4. The DW must be a secure bastin that prtects ur infrmatin assets. 5. The DW must serve as the fundatin fr imprved decisin making. 6. The business cmmunity must accept the DW if it is t be deemed successful [6]. 10

2.1 Data warehuse develpment appraches Althugh DWs are tday widely spread and used tday, there lacks a single pinin regarding best DW develpment apprach. Pineers f data warehusing, Bill Inmn and Ralph Kimball, intrduced their wn appraches which are dminant tday. Inmn advcates building a central enterprise-wide DW that wuld prvide an verall business intelligence system. This apprach is als knwn as tp-dwn apprach that adapts traditinal relatinal develpment tls t the needs f DW. Frm this central enterprise-wide DW, individual department based DWs are develped t serve the analytical needs f that department. Kimball s apprach recmmends building business prcess based databases, knwn as data marts, which can later be integrated using an infrmatin bus. This apprach is als knwn as bttm-up, because it first addresses individual business prcess needs and then sums them up t the enterprise-wide slutin. Kimball s apprach is unique t data warehusing as it intrduces a dimensinal data mdel, which rganizes data in several fact and dimensinal tables. The dimensinal data mdel will be explained in mre detail in chapter 2.3. The differences between Inmn s and Kimball s appraches are many and deep, where develpment methdlgies, data mdelling and DW architectures are the mst essential [2]. Accrding t M. Breslin these differences are represented in Table 2.1. Table 2.1 Cmparisn f the essential features f Inmn's and Kimball's mdels Inmn Kimball Methdlgy and architecture Overall apprach Tp-dwn Bttm-up Architectural structure Enterprise wide (atmic) Data marts mdel a single DW "feeds" departmental business prcess; databases enterprise cnsistency achieved thrugh data bus and cnfrmed dimensins Cmplexity f the methd Quite cmplex Fairly simple 11

Cmparisn with Derived frm the spiral Fur-step prcess; a established develpment methdlgy departure frm RDBMS methdlgies methds Discussin f physical Fairly thrugh Fairly light design Data mdelling Data rientatin Subject- r data-driven Prcess riented Tls Traditinal (ERDs, DISs) Dimensinal mdelling; a departure frm relatinal mdelling End-user accessibility Lw High Philsphy Primary audience IT prfessinals End users Place in the rganizatin egral part f the Transfrmer and retainer f Crprate Infrmatin peratinal data Factry (CIF) Objective Deliver a sund technical Deliver a slutin that slutin based n prven makes it easy fr end users database methds and t directly query the data technlgies and still get reasnable respnse times 2.2 Cmpnents f a data warehuse Accrding t R. Kimball there are fur main cmpnents f DW: 2.2.1 Data surces Typically, the surce f the data fr DW are peratinal applicatins, r t be precise, their databases. These databases are referred t as peratinal databases. Operatinal databases are designed with the purpse f managing dynamic data in real-time, in shrt transactins and with the aim f prviding the best perfrmance and availability while creating and mdifying the data. 12

Operatinal databases shuld be thught f as utside the DW because presumably we have little t n cntrl f the cntent and frmat f the data stred in them. Queries against surce systems are narrw, ne recrd at a time queries that are part f the nrmal transactin flw and severely restricted in their demands n the peratinal system. 2.2.2 Data staging area Data staging area f the DW is bth a strage area and a set f prcesses cmmnly referred t as Extract-Transfrm-Lad (ETL). It includes everything between peratinal surce systems and the presentatin area. As Kimball illustrates, it is smehw analgus t the kitchen f a restaurant, where raw fd prducts are transfrmed int a fine meal. The key architectural requirement fr the data staging area is that it is ff-limits t business users and des nt prvide query and presentatin services. Its nly respnsibility shuld be accessing the selected data frm multiple hetergeneus databases and ther infrmatin surces and transfrming it t the shape fit fr user query and cnsumptin. The prcess f getting data frm peratinal surces t DW cnsists f three steps. The first step is the extractin f data frm varius surces. Extracting means reading and understanding the surce data and preparing it fr further manipulatin. Once the data is extracted t the staging area, there are numerus ptential transfrmatins, such as cleaning the data, cmbining data frm multiple surces, deduplicatin f data, assigning warehuse keys, etc. These transfrmatins are all precursrs t the final step, lading the data int the DW. 2.2.3 Data presentatin Data presentatin, as Kimball calls it, is where data is rganized, stred and made available fr querying by users, reprts and ther analytical applicatins. When the DW is mentined, this is the cmpnent which is cmmnly being referred t. It represents extracted and transfrmed data being stred in the way that is suitable fr extensive querying while being understandable t the business users. Data presentatin area is rganized in a series f data marts, which presents the data frm a single business prcess. A single data mart cnsists f dimensinal and fact tables where the actual data 13

is stred. The data mdel cnsisting f dimensinal and fact tables is called dimensinal mdel and is discussed in chapter 2.3. 2.2.4 Data access This is the final cmpnent f the whle DW system. It represents a variety f techniques and tls that allw the user t access the data in the presentatin area, including reprt systems, OLAP r data mining tls. 2.3 Dimensinal data mdel Databases in transactinal (peratinal) systems cmply with nrmal frms and are ptimized fr giving best perfrmances in the cntext f handling individual transactins. A dimensinal mdel is a lgical design technique that seeks t present the data in a standard, intuitive framewrk that allws fr high-perfrmance access. It adheres t a discipline that uses the relatinal mdel with sme imprtant restrictins. Every dimensinal mdel is cmpsed f ne table with a multipart key, called the fact table, and a set f smaller tables called dimensin tables. Each dimensin table has a single-part primary key that crrespnds exactly t ne f the cmpnents f the multipart key in the fact table [5]. Dimensinal mdels can be implemented in the fllwing tw methds: Star schema Snwflake schema The difference between thse tw schemas is in the arrangement f the dimensinal tables and their relatinships twards the fact table. The fact table f the star schema is directly cnnected t all the dimensin tables that describe it. All dimensins are in ne t many relatinship and they visualize a star. The snwflake schema is an extensin f the star schema which is als cmpsed f a fact table and a set f related dimensin tables, but they are nrmalized int sub-dimensin tables. Nt all dimensin tables are related t the fact table because sme dimensin tables are related nly t ther dimensins. 14

3 CASE STUDY The integral part f the Energy infrmatin system (EIS) develped fr the custmer is the extensive and dynamic reprting system. Generating cmplex reprts has a significant impact n perfrmance f peratinal (transactinal) database. And t avid the lad n it, the lgical step was t intrduce DW. The first step in the prcess f implementing a DW was t evaluate requirements and t determine what develpment apprach t chse. After evaluating the requirements, characteristics f the system and technlgies and persnnel we had n ur dispsal, a decisin was made t take Kimball s bttm-up apprach. Arguments fr this decisin are prvided in Chapter 3.1. After deciding that ur DW wuld use the dimensinal mdel, we still didn t knw what mdel schema wuld best suit ur needs. After cnsulting the dcumentatin and taking the pinins f team members int cnsideratin, we made the decisin t g with the star schema, but engineering curisity gave us the idea t cnduct a case study n ETL prcess perfrmance fr DW mdel implemented with the snwflake schema. The gal f this study was t determine which dimensinal data mdel schema implementatin will yield better results cncerning executin f the ETL prcess. 3.1 Chsing the apprach A decisin n what develpment apprach t take had t be made taking the fllwing int cnsideratin: Data warehuse needed t be implemented in a very shrt time frame Our team members didn t have any practical experience with DW develpment The custmer agreed t develp the DW in phases, where each develpment phase wuld address ne specific area f the system Data surce system was stable and develped by ur team Based n the theretical knwledge f the team members, ur decisin was leaning tward Kimball s apprach and further research cnfirmed the decisin. In her article Data Warehusing Battle f the Giants: Cmparing the Basics f the Kimball and Inmn Mdels, 15

M. Breslin presented guidelines that helped us make a decisin. She summarizes specific characteristics in favur f Inmn s r Kimball s mdel, which are presented in Table 3.1. Table 3.1: Specific characteristics in favur f Inmn's r Kimball's mdel Characteristic Favurs Kimball Favurs Inmn Nature f the rganizatin's Tactical Strategic decisin supprt requirements Data integratin requirements Individual business areas Enterprise-wide integratin Structure f data Business metrics, Nn-metric data and fr perfrmance measures, data that will be applied t and screcards meet multiple and varied infrmatin needs Scalability Need t adapt t highly Grwing scpe and vlatile needs within a changing requirements are limited scpe critical Persistency f data Surce systems are High rate f change frm relatively stable surce systems Staffing and skills Small teams f generalists Larger team(s) f requirements specialists Time t delivery Need fr the first DW Organizatin's applicatin is urgent requirements allw fr lnger start-up time Cst t deply Lwer start-up csts, with each subsequent prject csting abut the same Higher start-up csts, with lwer subsequent prject develpment csts After cnsulting thse guidelines, the fllwing characteristics were in favur f Kimball s mdel: Data integratin requirements A slutin was needed t address an individual business area mnitring f a building s energy cnsumptin. Persistency f data The data surce system is an EIS peratinal database Staffing and skills requirements 16

We had a small team which didn t include a data warehusing specialists at ur dispsal Time t delivery The first phase f DW had a deadline f just 2 mnths Taking everything int accunt we chse Kimball s DW apprach. 3.2 Data warehuse develpment 3.2.1 Surce database Data surce fr this warehuse is the peratinal database f EIS applicatin. Only ne data surce made the develpment f the database easier because we did nt need t integrate and merge data frm varius surces. Image 3.1 presents the part f the mdel f the EIS peratinal database that is relevant fr DW. CustmerSubtype Custmer Bill BillItemStatus BillItem CustmerType ServiceItemType EnergySurce Measurement PintUser EnergySurceType EntityUse EnergySurce Cnsumptin EnergySurce Purpse Measurement Pint CunterRawData Cunter ServiceItem Tariff Measurement PintEntity TariffMdel Entity Details WeatherStatin EntityType Entity Service WeatherStatin Variable WeatherStatin VariableFrEntity AssetCategry Mnthly Cnsumptin IndicatrValue Yearly Cnsumptin IndicatrValue EntityHierarchy Supplier WeatherDataType WeatherData AssetType Asset numeratr Cnsumptin Indicatr denminatr EnergyEfficiency Reprt EnergyEfficiency Class AssetDetails Definitin AssetDetails Cnsumptin IndicatrElement EnergyEfficiency Certificate Image 3.1: EIS peratinal database mdel 17

3.2.2 Dimensinal mdel design Data mdel f DW depends n bth input and utput data. Data is acquired frm the peratinal database. The peratinal database mdel is presented in Image 3.1. Due t the intellectual wnership f the sftware slutin and all f its parts, fr the purpse f this case study the data mdel had t be bfuscated, but bfuscatin was dne in such a way that the mdel kept its integrity and all the majr parts necessary fr credible results f this study. Als, nly a part f the mdel that is relevant fr this study is presented. Data warehuse data mdel had t be designed in a way that will allw string the data necessary fr analysis and reprt generatin. Accrding t Kimball, dimensinal data mdel design cnsists f fur steps: 3.2.2.1 Step 1 In the first step f the dimensinal mdel design the gal is t fully understand the business prcess and need f the future users. Based n the requirements analysis and cnsulting with users, the knwledge f values n which the reprts will be based was gained. This infrmatin was the fundatin fr defining the fact tables. After requirements analysis it was determined that we need the fllwing eight fact tables: FactBillItemFrEntity Entity s 3 energy cnsumptin 4 fact table FactBillItemFrCustmer Custmer s 5 energy cnsumptin fact table FactEntityUse Fact table that represents using f Entities by Custmers FactAssetDetailsValue Fact table that represents Asset 6 in Entities FactIndicatrElementValueFrEntity Fact table f Energy cnsumptin indicatrs fr Entities FactIndicatrElementValueFrCustmer Fact table f Energy cnsumptin indicatrs fr Custmers FactMeasurementData Measurement data 7 fact table 3 In EIS every rm, building r cmplex that can have its wn energy cnsumptin (energy cnsumptin measurement devices) is cnsidered as an Entity 4 Energy cnsumptin refers t cnsumptin stated n bills prvided by suppliers 5 Custmer is smene wh is using an Entity (e.g. schl, public library, hspital) and generates energy cnsumptin in that Entity thrugh activities (e.g. heating, cling, using the lights) 6 Asset is equipment in Entity that can cnsume sme energy surce (e.g. heating surce, air cnditiner, light bulb) 7 Measurement data is energy cnsumptin data acquired directly frm energy cnsumptin measurement devices 18

FactWeatherData Meterlgical data fact table Fr the purpse f this case study, analysis was dne nly n the example f Entity energy cnsumptin fact table (FactBillItemFrEntity) and Custmer energy cnsumptin fact table (FactBillItemFrCustmer). Every schema will have bth f thse fact tables, s that will result in fur fact tables in ttal that will be develped in this case study. In EIS DW we need t track the cnsumptin f energy surce and cst assciated with that cnsumptin. Cnsumptin and cst need t be tracked regarding wh cnsumed the energy, the time perid f cnsumptin and the purpse f that cnsumptin. 3.2.2.2 Step 2 In this step the granularity f fact tables needs t be defined. Kimball and Rss give advice t develp the dimensin mdel n mst atmic infrmatin captured by a business prcess [6]. Energy cnsumptin and the financial cst f energy being cnsumed are tracked n the basis f fficial data prvided by the energy surce suppliers, and are expressed in bills. Fr that reasn, the smallest r mst atmic infrmatin that can be tracked frm the bill is the bill item expressed n that bill. The bill item with its time perid is taken as the grain f bth Entity and Custmer energy cnsumptin fact tables. The same grain is als defined fr bth star and snwflake schemas. 3.2.2.3 Step 3 During the first tw steps f DW mdel design there were n differences when bth schemas that are used are cmpared. In bth cases the same business mdel and the same facts were cmpared. But in the third step f the develpment prcess dimensins describing the facts need t be defined. The star and snwflake schema use different appraches t arranging dimensinal tables, s mdels differ. The snwflake schema cmplies with the third nrmal frm and is similar t the peratinal database mdel. Dimensins that describe a single fact have similar dependencies as assciated tables in the peratinal database, s nt every dimensinal table is directly cnnected t the fact table. 19

On the ther hand, the star schema is denrmalized and thus differs frm the peratinal EIS database significantly. There are n dependencies between dimensins and the dimensins are directly cnnected t the facts. Accrding t Kimball and Rss, the basic dimensins fr every fact table are derived frm its granularity. Since granularity fr energy cnsumptin based n the data derived frm bills is the same fr all the fact tables that are ging t be develped in this case study, they all have the sme cmmn basic dimensin. Thse are: Date dimensin (DimensinDate) all cnsumptin data has t be placed in the time perid in which it was made. T fulfil this request, every recrd in the fact table has t be described with tw dates it has t have tw cnnectins t dates dimensin table. Bill dimensin (DimensinBill) fact tables represent bill items, s every fact table must have a cnnectin t the bill s dimensin. (pveznicu s dimenzijm računa?) Measurement pint dimensin (DimensinMeasurementPint) every cnsumptin is made at sme measurement pint. A measurement pint is an bject in the EIS mdel that cnnects cnsumptin data frm the bills with the measurement data acquired directly thrugh measurements by measuring devices. Energy surce dimensin (DimensinEnergySurce) cnsumptin refers t sme f the energy surces s every fact table must have a cnnectin t the energy surce s dimensin. Energy surce purpse dimensin (DimensinEnergySurcePurpse) all energy cnsumptin is made with sme purpse. Thrugh the energy surce purpse dimensin all f the cnsumptin is assciated with sme purpse, like heating, cling, cking, preparatin f dmestic ht water, etc. Thse dimensins shared acrss mre fact tables in different data marts are called cnfrmed dimensins. Other dimensins are nt shared amng all fur fact tables but relate nly t ne fact table. In the next tw chapters the third design step fr mdels f bth schemas is described, first fr Snw and later fr Snwflake schema. 20

(a) Dimensin definitin f Star schemas The star schema is characterized by having dimensins cnnected directly t fact tables. Bth f the energy cnsumptin fact tables in the star schema rely n the same surce data, s they share a majrity f dimensins. In additin t cmmn dimensins, the nly difference is that Entity energy cnsumptin fact table (FactBillItemFrEntity) is cnnected t Entity dimensin (DimensinEntity) and Custmer energy cnsumptin fact table (FactBillItemFrCustmer) is cnnected t Custmer dimensin (DimensinCustmer). Image 3.2 and Image 3.3 represent dimensins in the star schema mdel f Entity s and Custmer s cnsumptin fact tables, respectively. DimensinBill DimensinEnergySurceCnsumptin DimensinServiceItem FactBillItemFrEntity EntityId EnergySurceId DimensinDate DimensinEnergySurce EnergySurcePurpseId FrmDateId TDateId MeasurementPintId ServiceItemId EnergySurceCnsumptinId BillOwnerId BillId Quantity QuantityInKWh AmuntWithutTax AmuntWithTax CnfirmatinUser EntryUser DimensinEntity DimensinCustmer DimensinEnergySurcePurpse DimensinMeasurementPint Image 3.2: Mdel f entity's cnsumptin fact table with its dimensins in star schema 21

DimensinMeasurementPint DimensinEnergySurce FactBillItemFrCustmer CustmerId DimensinServiceItem EnergySurceId EnergySurcePurpseId FrmDateId TDateId DimensinDate DimensinCustmer MeasurementPintId ServiceItemId EnergySurceCnsumptinId BillOwnerId BillId Quantity QuantityInKWh AmuntWithutTax AmuntWithTax CnfirmatinUser EntryUser DimensinBill DimensinEnergySurceCnsumptin DimensinEnergySurcePurpse Image 3.3: Mdel f custmer's cnsumptin fact table with its dimensins in star schema (b) Dimensin definitin f Snwflake schemas Snwflake schemas cmply with the 3 rd nrmal frm s a mdel in that schema is a lt clser t the mdel in the surce database. But because fact tables in the star and snwflake schemas are based n the same surce data they have t be described with the same infrmatin. Fr that reasn, a mdel cmplying with bth star and snwflake share the same dimensins but with different relatins. While there are n relatins between dimensin tables in the star schema, dimensins are related t each ther in the snwflake schema. Image 3.4 and Image 3.5 represent fact tables with their dimensins in the snwflake schema. 22

DimensinEnergySurce DimensinServiceItem DimensinEntityType FactBillItemFrEntity DimensinServiceItemId DimensinEntity DimensinEntityDetails DimensinEntityDetailsId DimensinEnergySurceId DimensinDateFrmId DimensinDateTId Quantity AmuntWithutTax AmuntWithTax EisBillItemId QuantityInKWh DimensinMeasurementPintId DimensinBillId DimensinEnergySurceCnsumptinId DimensinDate DimensinEnergyEfficiencyClass DimensinMeasurementPint DimensinCustmer DimensinBill DimensinEnergySurceCnsumptin DimensinCustmerSubtype DimensinEnergySurcePurpse DimensinCustmerType Image 3.4: Mdel f entity's cnsumptin fact table with its dimensins in the snwflake schema 23

DimensinEnergySurcePurpse DimensinEnergySurce DimensinEnergySurceCnsumptin FactBillItemFrCustmer DimensinCustmerId DimensinCustmerType DimensinServiceItemId DimensinEnergySurceId DimensinDateFrmId DimensinDateTId DimensinCustmerSubtype Quantity QuantityInKWh AmuntWithutTax AmuntWithTax DimensinCustmer EisBillItemId DimensinBillId DimensinMeasurementPintId DimensinEnergySurceCns... DimensinMeasuremen DimensinServiceItem DimensinDate Image 3.5: Mdel f custmer's cnsumptin fact table with its dimensins in the snwflake schema 3.2.2.4 Step 4 The furth step f the dimensinal mdel design implies a mre detailed analysis f what infrmatin has t be included in the fact tables. That infrmatin is: Energy cnsumptin in the basic measurement unit stated in the bill (Quantity) Energy cnsumptin stated in kwh (QuantityInKWh) Amunt 8 with tax (AmuntWithTax) Amunt withut tax (AmuntWithutTax) 8 All amunts are expressed in Cratian natinal currency Kuna (HRK) 24

3.2.2.5 Defining f attributes f dimensin tables At the end f the prcess f dimensinal mdel design, attributes f dimensin tables still had t be defined. Because dimensins describe values in fact tables, dimensin tables must have all the infrmatin that can be used t describe fact tables. That is particularly imprtant because the custmer s request was t prvide them with the ability t create dynamic custm reprts, thus we had t predict what attributes they might need. Tables Table 3.2 t Table 3.22 present attributes f all dimensin tables fr bth schemas. (a) Attributes f dimensin tables in the star schema Table 3.2: Attributes f dimensin DimensinEntity fr star schema Attribute name Id EisEntityId Name TypeName TypeDescriptin Address CnstructinDate LastRenvatinDate Area Vlume SurfaceArea HeatingArea ClingArea HeatingVlume ClingVlume HeatingSurce ClingSurce EnergyEfficiencyClassName BuildingShapeFactr AnnualRequiredThermalEnergyFrReferenceClimaticData SpecificAnnualRequiredThermalEnergyFrReferenceClimaticData Type String String String String DateTime DateTime Duble Duble Duble Duble Duble Duble Duble String String String Duble Duble 25

Nte GisX GisY GisGpsLngitude GisGpsLatitude GisSymbl String Duble Duble Duble Duble String Table 3.3: Attributes f dimensin DimensinCustmer fr star schema Attribute name Id EisCustmerId Name TypeName TypeId SubtypeName SubtypeId EmplyeesCunt Address Type String String String String Table 3.4: Attributes f dimensin DimensinEnergySurce fr star schema Attribute name Id EisEnergySurceId Name Type TypeDefinitin Type String String String Table 3.5: Attributes f dimensin DimensinEnergySurce fr star schema Attribute name Id EisEnergySurcePurpseId Name Type String 26

Table 3.6: Attributes f dimensin DimensinEnergySurceCnsumptin fr star schema Attribute name Id EisEnergySurceCnsumptinId Type Table 3.7: Attributes f dimensin DimensinMeasurementPint fr star schema Attribute name Id EisMeasurementPintId Name Number Type String String Table 3.8: Attributes f dimensin DimensinServiceItem fr star schema Attribute name Id EisServiceItemId Name Unit Type TypeDefinitin Type String String String String Table 3.9: Attributes f dimensin DimensinBill fr star schema Attribute name Id BillId BillNumber Type String 27

(b) Attributes f dimensin tables in the snwflake schema Table 3.10: Attributes f dimensin DimensinBill fr snwflake schema DimensinBill Type Id EisBillId BillOwnerId (Id f DimensinCustmer) Table 3.11: Attributes f dimensin DimensinCustmer fr snwflake schema DimensinCustmer Type Id EisCustmerId DimensinCustmerSubtypeId Name String Table 3.12: Attributes f dimensin DimensinCustmerSubtype fr snwflake schema DimensinCustmerSubtype Type Id EisCustmerSubtypeId DimensinCustmerTypeId Name String Table 3.13: Attributes f dimensin DimensinCustmerType fr snwflake schema DimensinCustmerType Type Id EisCustmerTypeId Name String 28