 University Monash University Subject C6007: Artificial Intelligence

Question 2. Consider these concepts: Cl. Linear Discriminant Analysis and 02 SVM.

Identify two significant similarities between Cl and C2:

• M. First Similarity

• AB. Second Similarity

Question 3. Identify two different approaches to time series analysis.
In the space provided, identify and briefly describe the two approaches.

• Ml. First Method

• M2. Second Method

Identify two significant differences between M1 and M2:

• AA. First Difference

• AB. Second Difference

Question 5 . Consider the business case explained at the top of this paper. While developing a predictive model, your colleague suggested that the label attribute can be effectively estimated with linear regression using a subset of the relevant predictors.

She created such a model in RapidMiner and the results have been provided to you (in your answers refer to Figures 2A, 2B, …).

Considering the table of coefficients included in the figures:

• IA. The observed coefficients indicate that (be specific)

• 18. The observed p•values indicate that (be specific)

• IC. The observed tolerance values indicate that (be specific)

• 0. Explain whether or not the regression model described with the reported table of coefficients is ready for deployment (be specific)

Question 6

Describe the distribution of residuals and the chart of predicted vs. actual label values, included in Figures 2A, 2B, … • 44.

Explain what is the shape of residuals distribution and if it is likely to affect this regression model (be specific)

• AB. Explain whether or not outliers are present in data and if it is likely to affect this regression model (be specific)

• AC. Explain whether or not this model meets the regression assumptions and what are the implications of adopting the model as is (be specific)

Question 7

Based on the reported performance indicators and other relevant figures, give two suggestions on how to improve this specific model, justify.

• CA. The first suggested change

• CB. The second suggested change

Question 8

Consider the business case explained at the top of this paper.

Your team has decided to construct a classifier using a discretised label. They have tried three different models. Now they have asked you to select the best performing model based on the supplied evaluation results

Part A. The following four operators were used in the process. Very briefly explain the aims (what) and the specific reasons (why) of using the following operators in that process:

01. The aim of “Set Role” is to because

02. The aim of “Discretize” is to because

• 03. The aim of ‘Nominal to Binomial” is to … because … (2 marks) • 04. The aim of “SMOTE” is to … because

Part B. Suggest which of the currently used models has the best performance and why. Then, suggest a single most important change to the RapidMiner process for each of the three models to improve its performance.

• MB. Currently, the best and the worst-performing models are … because

• Ml. The performance of the best model can be improved by

• M2. The performance of the worst model can be improved by
Answer Part A ) in the box below

Question 10

Consider the business case explained at the top of this paper. It has been suggested that the text attributes included in the provided examples could offer additional insights.

Develop a RapidMiner process responsible for k-NN Global Anomaly Detection using just text attributes. Utilise additional operators to assist anomaly elimination as well as anomaly diagnostics using a scatter plot. Do not attach any drawings of the process, just answer the following questions.

Part A. Consider the main process, then

• PP. Provide a list of operators responsible for data preprocessing, for each identify its name … role … and in brackets their main parameters

• TP. Provide a list of operators responsible for text processing, for each identify its name … role … and in brackets their main parameters

• DC. Provide a list of operators for anomaly detection and elimination, for each identify its name … role … and in brackets their main parameters

• DR. Provide a list of operators responsible for anomaly diagnostics with a scatter plot, for each identify its name … role … and in brackets their main parameters

Part B . Provide details of the sub-process responsible for parsing all text fields of the examples.

• TA. Provide a list of operators responsible for text processing within the sub-process of the “Process Documents from Data” operator, for each identify its name … role … and in brackets their main parameters (if any).

• TB. Explain the parameters of the “Process Documents from Data” operator (do not describe the operators of its internal sub-process).

Question 12

Consider the business case explained at the top of this paper.
Your colleague developed a RapidMiner process that aimed at producing optimisation charts listed in Figures 5A, 5B… Unfortunately, she left the company and your job is to quickly explain and then replicate her work.

Explain the workflow within the operator “Loop Parameters”, where you will have to use log operators and a holdout validation for the sake of efficiency.

• WF. Explain the logic of the workflow in a short paragraph

• OP Provide a list of operators used in this workflow, for each identifies its name role and in brackets the main parameters.

Question 13

Explain the hyper-parameters to generate the optimum performance measurements for the model, refer to the plots presented in Figures 5A, 5B, …

. Identify the performance measurements tracked and logged by the optimiser. TA. Identify the operators and their parameters that were logged, explain why it was important to experiment with them.

TB. Identify the specific hyper-parameter values that would result in the model optimum performance, justify your answer.

