- Results -

The model performed fairly well with parameter tuning and adjustments to the architecture. Although LSTM's can retain crucial information and learn complexities in sequential text data, they can be prone to overfitting. This was one thing that needed to be balanced and maintained during tuning. In addition to dropout layers, monitoring features were added during the training loops to ensure optimization and prevent overfitting.

Early Stopping - determining the right number of epochs can be difficult and typically requires an empirical approach. To assist in this, and limit the time consuming process of just trail and error, early stopping monitors a given metric (in this case validation accuracy) and halts the model if it trends towards overfitting or extended lack of improvement.

Model Checkpoint - In conjunction with early stopping, model checkpoints ensure that the most optimal model is saved. The model checkpoint asses the models performance on each epoch for improvement and "bookmarks" the highest performing model. In this way, the final, resulting model contains the most effective parameters.

Training Performance

The parameters that had the most significant impact on the models performance through training

The model developed significant increase in accuracy through training, however, ultimatly the highest perfromance occured around the 8th epoch when the model performed with 0.805 validation accuracy.

1st Epoch:

Best!

Last Epoch:

Loss Over Training Epochs

Model Evaluation

Reviews for California State University San Marcos were gathered to test the model on. There were 109 positive reviews and 109 negative reviews. The model performed fairly well in classifying the texts. The model seemed to have a higher tendency to misclassify Negative reviews as positive. A closer look at some of the misclassified text shows how many comments contain positive and negative comments about the university. This can make it difficult to classify and often the presence of a certain word (or not) can push the review over to the wrong label. This is one of the aspects that makes sentiment analyses of reviews difficult. Often there are reviews on each end of either extreme love or extreme hate and then a majority of evaluations stand in the middle- with students sharing both likes and dislikes.

Accuracy Over Training Epochs

Although jagged, the loss and accuracy trends throughout training showed general trends towards increasing accuracy and decreasing loss:

"Most students come from local high schools in pre-formed cliques. Coming from another university where nobody knew each other, it is disappointing how high-school-esque the students can be. It is incredibly hard to make friends here. Great professors, great facilities, and overall great education. Just expect a typical high-school vibe in students." Rating: 2.9

"'CSUStairMaster!'" Raying: 4.0

"Pros: Library, easy to transfer to, easily access from freeways, high female to male ration (3:1), great instructors, day care options and scholarships if you have kids, Cons: Parking fees, student store is overpriced, customer service of student services" Rating: 3.9

"Insane parking pass cost, rising tuitions every year, professors clearly not interested in instructing, professors complaining about wage increases for the dean, so few classes offered that every course has a waitlist, etc etc" Rating: 3.9

On the other hand, the review ratings are not explicitly set by the student, but rather calculated based on 10 rating categories. These categories include the following:

Opportunities
Reputation
Safety
SocialRating
Clubs

Facilities
Food
Happiness
Internet
Location

As a result, the overall rating isn't a perfect indicator of the student's overarching feelings towards the school in every instance and therefore not always an indication of the nature of the comments they include in the written review. This also made it difficult to determine an appropriate threshold for where to differentiate "positive" from "negative" reviews. Some of the reviews shown here are examples of text within the data that were incorrectly labeled by the model, but in reality the "true" label might have been a poor indication of the review's sentiment. In an improved model, the training data should be hand cleaned in order to ensure that sentiment is properly labeled.