аЯрЁБс>ўџ 9;ўџџџ8џџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџьЅС7 №ПТbjbjUU .67|7|N џџџџџџlвввввввц 8D P DцsЖ    (Ш Ш Ш Ш Ш Ш ў) I†-вШ Ш Ш Ш Ш œ ввШ Ш -œ œ œ Ш RвШ вШ ўœ Ш ўœ Кœ V~ввЪШ ” РW!ooИПц&  ‚–Ъ4C0sž,Яœ ЯЪœ ццввввйAI Course Work 4 This report considers some aspects of the ID3 search algorithm as described in the course notes and handouts provided for this course. Attempts have been made to augment this material using the LRC as a reference source. However, the volume of material available on campus in no way resembles the vast list of titles provided at the back of the course handout. In fact only three dated works exist in Hatfield, none of which are of use for this report. Key Choices of ID3 The Key Choices for users of ID3 style algorithms are: Properties, Leaf Nodes and the Root Property from where the tree starts. When selecting properties it is important to filter out or disregard those properties that do not add discrimination to the search tree. If several values of a property always lead to the same leaf node, then that property is not aiding the selection process and should be removed. The Leaf Nodes are the eventual outcomes of the decision tree. These should be selected based on what outcomes are required of the decision process. The Root Property is the start node of the decision tree. Starting the tree from different nodes has the effect of changing the efficiency of the search. To optimise the search, the “correct” root property is selected following the principles of Occam’s Razor [William of Occam1324] utilising information theory [Shannon 1948] embodied in the 3 formulae shown below: n I(M) = " - p(mi) log2(p(mi)) i=1 n Ci E(P) = " - I(Ci) i=1 C gain(node) = I(table)  E(node) The Deficiencies of ID3 And It s Children Two of the major deficiencies of the ID3 algorithm are it s poor handling of missing attributes and how it deals with bad data – i.e. data that contains identical attributes leading to differing leaf nodes in the decision structure. Missing attributes can affect the functioning of ID3 algorithms in three areas: During evaluation of the attributes when constructing the decision tree. When partitioning the examples based on the attribute values. When classifying new examples using the constructed decision tree. Bad or inappropriate data will affect the construction of the decision tree and may even lead to an overly complex and inefficient tree. The Safety Classification Problem When trying to classify something as either safe or unsafe, it is important that the decision tree should choose to err on the side of safety only when the data implies absolute safety. If the data is at all vague or marginal, then the decision tree must err on the side of the unsafe. When deciding about loan applications a credit history of unknown is not going to risk lives. By comparison, if the operational safety of a nuclear power station is unknown, this should be classified as unsafe until proven otherwise. Overcoming ID3 Type Deficiencies There are several methods that can be used to limit the effects of the ID3 based deficiencies, both before and after the construction of the decision tree. Prior to decision tree construction, consideration should be given to removing bad or missing data from the training set. By only using known good data, the tree will be generated in an efficient form with low redundancy at the leaf nodes. The decision tree can then be tested using a set that contains some missing or bad data. This testing will gauge the accuracy of outcomes with the bad or missing data sets and indicate whether further work on the decision tree is required. This could form the basis of pre or post pruning techniques – see below. After tree construction you can apply various post-pruning techniques to remove leaf nodes with the same values and replace the higher node with one leaf node of the desired category. Obviously this can be further extended to include groups of leaf nodes of dissimilar categories if the ratio between the types is of a large enough value and in the correct direction. For instance if 19 out of 20 leaf nodes indicate that something is unsafe and only one indicates that it is safe, then the logical pruning scenario would be to remove all twenty leaf nodes and replace the higher node with one unsafe leaf node. If the ratio were reversed and 19 leaf nodes indicated safe and one leaf node indicated unsafe, then these nodes should not be pruned. Pruning under these circumstances would lead to potentially unsafe categorisations – i.e. categorising something as safe when in fact it is unsafe. Evaluation Of The Above Solutions Extensive research into both ID3 and it’s children has lead to a wealth of information on the subject including many on-line databases of sample data relevant to many areas. These can be used to train inductive learning systems and evaluate their results. One such evaluation [Quinlan 1986] considered a simple chess problem where board positions were recognised as a loss for black within three moves. Of the 1.4 million scenarios, 474,000 were a loss for black. The results for ID3 are tabulated below: Training Set Size% Of UniverseError1Predicted Max Errors% Errors2000.0119972899.51,0000.07331463.35,0000.368290.1625,0001.79670.024125,0008.93210.0016NOTE 1: Error is quoted as Error Count in 10000 Trials. NOTE 2: % Errors is Error Count over Training Set Size as a percentage. Other researchers in different fields have supported these impressive results using different data sets and problem paradigms. From the exhaustive research into inductive learning algorithms generally, it is clear that learning samples containing small proportions of missing attribute data can be tolerated by having the missing attribute data examples truncated from the sample training set. However, when the ratio of missing attribute data samples within the set becomes too high, then these algorithms are not ideal and are really not appropriate in the areas of safety. This is because data that is removed from the training set reduces the data quantity (and a reduced quantity can be seen to reduce the accuracy using the table above) and removal of these particular samples may also remove some content areas from the problem space (which limits the algorithm’s ability to cope within the vague, un-referenced area). Richard H Reepe CSP2 Student No 98000746 _____________________________________________________________________ _____________________________________________________________________  FILENAME \p H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.doc Page  PAGE 3  4 T `  ‘     $ & B J T ` r t | ~ ˜ Є •žЎ%klyz­ЎДЕЛМНОТќќњњњњњњіњњњѓќюхюхнхжгжЫж0JmHnHu0J j0JUmHnHsH ujUmH sH mH sH CJ5\H*6]*жщij„…Š ‹ ‘ , J L ` ‚ Є І ц ш < ˆ њјђјјјјјјјјэужјэЬжЬЬјђј „ „а^„ `„а „ „аdx^„ `„а „а„а^„а`„аdx„а^„а$a$ЕСўўˆ ‰ й " ` Ѓ Є -OmnXy?@Псклэћ §§јјј§§ђ§§§ђ§§§§§ђ§§ььььь$If„а^„а & F !%*.27X\RRRRR$IfЇ$$If–lж  џџџџ  џџџџџџџџџџџџжr”џ”И№t  „ $  8 tПж0€ џџџџџџџџ€ џџџџџџџџџџџџџџџџџџџџџџџџі\6жџџџџџжџџџџџж€€€€€жџџџџџ4ж laі78>CFJNX\RRRRR$IfЇ$$If–lж  џџџџ  џџџџџџџџџџџџжr”џ”И№t  „ $  8 tПж0€ џџџџџџџџ€ џџџџџџџџџџџџџџџџџџџџџџџџі\6ж€€€€€жџџџџџжџџџџџжџџџџџ4ж laіNOUZ\_delqXXRRRRRX\RR$IfЇ$$If–lж  џџџџ  џџџџџџџџџџџџжr”џ”И№t„$ 8 tПж0€ џџџџџџџџ€ џџџџџџџџџџџџџџџџџџџџџџџџі\6жџџџџџжџџџџџжџџџџџжџџџџџ4ж laі qsu{|„‰‹”љљљRdљљљљљЇ$$If–lж  џџџџ  џџџџџџџџџџџџжr”џ”И№t„$ 8 tПж0€ џџџџџџџџ€ џџџџџџџџџџџџџџџџџџџџџџџџі\6жџџџџџжџџџџџжџџџџџжџџџџџ4ж laі$If ”•Э•–Ео$%XVVVVVTTVЇ$$If–lж  џџџџ  џџџџџџџџџџџџжr”џ”И№t„$ 8 tПж0€ џџџџџџџџ€ џџџџџџџџџџџџџџџџџџџџџџџџі\6жџџџџџжџџџџџжџџџџџжџџџџџ4ж laі %kПРСТ§§ћћћ,1hА‚. АЦA!А"А# $ %ААФАФ Ф i8@ёџ8 NormalCJ_HaJmH sH tH R@R Heading 1$Є№Є<@&5CJ KH OJQJ\^JaJ T@T Heading 2$Є№Є<@& 56CJOJQJ\]^JaJ<A@ђџЁ< Default Paragraph Font,@ђ, Header  Ц9r , @, Footer  Ц9r &)@Ђ& Page Number[6 џџџџжщij„…Š‹‘ЏОПЩкыь  7!"rЛљ<=Цш  ё  Ў Џ и й Xzst†”›АЙКОУЫабзмпучшюѓѕј§ў  "$&-.fЎ./\0€€˜0€0€˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж˜0€ж0€˜0€ ˜0€ ˜0€ ˜ 0€ ˜ 0€ ˜ 0€ ˜0€€˜0€€0€€˜0€€˜0€€˜0€€0€€˜0€€˜0€€˜0€€˜0€€˜0€€0€€˜0€€˜0€€Љ0€€Љ0€€Ћ0€€Љ0€€Љ0€€™0€€Љ@0€€Ћ@0€€Ћ@0€€Ћ@0€€›@0€€Љ0€€Љ0€€Љ0€€Љ0€€Љ0€€™0€€Љ0€€Љ0€€Љ0€€Љ0€€Љ0€€™0€€Љ0€€Љ0€€Љ0€€Љ0€€Љ0€€™0€€Љ0€€Љ0€€Љ0€€Љ0€€Љ0€€™0€€˜0€€˜0€€˜0€€˜0€€š€€pp   Тˆ 7Nq”%ТСЖФјџ”џ•€!”џ•€КЛЦШжихцNY\’”КЛУШЩЫхць№ŸЃNY\333333333џџRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docRichard H Reepe A:\aicw4.docRichard H Reepe A:\aicw4.docRichard H Reepe3H:\MyData\BSC\Csp2\ArtificialIntellegence\aicw4.docл,шt,Кўџџџџџџџџџh„а„˜ўЦа^„а`„˜ў.h„ „˜ўЦ ^„ `„˜ў.’h„p„LџЦp^„p`„Lџ.h„@ „˜ўЦ@ ^„@ `„˜ў.h„„˜ўЦ^„`„˜ў.’h„р„LџЦр^„р`„Lџ.h„А„˜ўЦА^„А`„˜ў.h„€„˜ўЦ€^„€`„˜ў.’h„P„LџЦP^„P`„Lџ.л,шtџџџџџџџџ         t†”›АЙКОУЧЫабзмпучшюѓѕј§ў  "$&-.\žžžžž–џ@(™™[0@0 0@џџUnknownџџџџџџџџџџџџG‡:џTimes New Roman5€Symbol3& ‡:џArial"qˆ№аhК=EК=EК=EђX>‚Ф@…Б№ ДД20dЂЬ2ƒQ№џџAI Course Work 4Richard H ReepeRichard H Reepeўџ р…ŸђљOhЋ‘+'Гй0œ˜ МШрьј  $0 L X d p|„Œ”фAI Course Work 4dI CRichard H Reepeichich Normal.doteRichard H Reepe4chMicrosoft Word 9.0@FУ#@ф=OoИП@ф=OoИП@ф=OoИПђXўџ еЭеœ.“—+,љЎ0ќ hp€ˆ˜  ЈАИ Р нфInjunea‚>Ђ  AI Course Work 4 Title ўџџџ !"#$%&'ўџџџ)*+,-./ўџџџ1234567ўџџџ§џџџ:ўџџџўџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџRoot Entryџџџџџџџџ РF  2ooИП<€1TableџџџџџџџџџџџџЯWordDocumentџџџџџџџџ.6SummaryInformation(џџџџ(DocumentSummaryInformation8џџџџџџџџџџџџ0CompObjџџџџjObjectPoolџџџџџџџџџџџџ  2ooИП  2ooИПџџџџџџџџџџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџўџ џџџџ РFMicrosoft Word Document MSWordDocWord.Document.8є9Вq