Oringinal dataset

Download the Dataset

Final basket transaction dataset

All quantative data and unrelated data are removed, therefore, all values left can be used in arm.

final basket transaction dataset

R code

Top 15 rules for sup

R code
Based on the rules, most private universities are expensive compared to most public universities in general. New York state has most expensive than others. There are 44 expensive universities in New York state. Pennsylvania state has second highest number of expensive universities in all the states. There are 35 expensive universities in Pennsylvnia state. And the third place in all states that has most expensive universities is Massachusetts state, which has 22 expensive universities.

Top 15 rules for conf

R code
Over 87% of expensive universities are private school. Over 10% of expensive universities are in New York state. Over 8% of expensive universities are in Pennsylvania state. Over 5% of expensive universities are Massachusetts state. And percentage of expensive universities in other states also shown as above.

Top 15 rules for lift

R code
The state of university can be predicted based on the fact if this universities are expensive. When a university is expensive, the most possible states of that university could be Massachusetts, Vermont, Maryland, Kentucky, or Oregon. In other word, Massachusetts, Vermont, Martland, Kentucky, and Oregon are states that most dependent with the fact if the university is expensive.

Convert rules to dataframe

R code

Visualizations

Plot of top 15 rules for confidence

R code
From the graph above, there are the specific support, confidence, and lift for each rule. There is an outlier whose support is 0.7, confidence is around 0.9, and lift over 1.2 while other universities has low values of support and confidence. Therefore, there is a university that is easy to predict whether it is expensive based on its location.

Plot 2

R code
The graph shows the association between the state of universities and the fact of if the university is expensive for top 15 rules based on the confidence. Georgia, Michigan, and Iowa have strongest associations with expensive universities. On the other hand, the probability of the university is expensive is relatively low in New Jersey and Minnesota.

NetworkD3

networkD3 plot
R code
All nodes on the graph are really close to each other, so it looks crowded. One of reasons for it might be data is not suitable for networkD3 plot. However, one thing can still be discovered from this plot: the number of private schools in states showed on the graph are not low.

Conclusions

The results of analysis can help people predict whether a university is expensive when they already know the state of that university or predict the state of a university when they already know if that university is expensive or not. Visualizations show that there are more private schools than public schools in the United States. So when there is no information given about a university, the probability of that university is private is higher than the probability of that university is public. Therefore, the smart choice is to predict that university is a private university, which also could be an expensive university. The tuitions of most universities are too expensive for students. However, the probability is different in all states. For example, the probability of expensive university in New York state is high. So the prediction of a random university in New York will be private and expensive.. On the other hand, in terms of making a prediction of a university in Minnesota state, it is better to predict it is not an expensive university because the probability of an expensive university in Minnesota state is really low.

The states of New York, Pennsylvania, and Massachusetts are most associated with the fact of the university is expensive. So, New York, Pennsylvania, and Massachusetts are three states that have the most expensive and private universities among all states. When making a prediction about the state of a university when if that university is expensive or not is already given, the probability of that university is located in New York, Pennsylvania, and Massachusetts are highest compared to the rest of states. Therefore, it is better to predict a random expensive university is located in the state of New York, Pennsylvania, and Massachusetts. Moreover, the probability of expensive university in Vermont, Maryland, New Jersey, or Minnesota is lowest compared to other states. So, it is better not to predict a random expensive university is located in one of these states.