- Analysis, Models & Methods -

Transfer Learning | FastText Embeddings

One powerful aspect of language models is their ability to learn under self-supervised conditions. This means that labeled data isn't needed for training. Instead, these models use attention to learn relationships and context relationships within the text. While this is an amazing feat, it requires a significant volume of text data for a model to learn word embeddings from scratch. Instead, it is common to implement existing embeddings that have already been trained on large amounts of data. Transfer learning, allowes the LSTM to focus on the task of text-classification (identifying sentiment in reviews) rather than starting at ground zero in learning language modeling. Instead of learning word embeddings from scratch, which is extremely computationally expensive and requires a high volume of data, by leveraging existing, pre-trained vectors, the model could perform better on the tuning task. There are many well known pre-trained embeddings such as Glove, Word2Vec, or in this case: FastText.

FastText is a word representation tool that is particularly suited for text with common language (such as human-written reviews or posts). Unlike other traditional models FastText considers parts of words, or "subwords." Rather than only considering words as a whole, it examines the smaller sub-blocks within words. This is particularly useful when dealing with "out of vocabulary" words, which occur often in student-written text (due to use of slang or conversational "texting" style).

Use-case specific advantages of FastText embeddings:

  1. Rich Semantic Understanding: FastText's subword information allows for a deeper semantic understanding of college reviews, which often contain domain-specific vocabulary, slang, and varied linguistic structures.

  2. Handling OOV Words: College reviews may include unique terms or institution-specific jargon. FastText's ability to handle OOV words by breaking them down into n-grams helps maintain the context and meaning in such cases.

________________________________________________________________________

Sentiment Analysis | Long Short Term Memory

Long Short-Term Memory (LSTM) are Sequence-2-Sequence neural networks, which means they tend to take a sequence of data in and put another sequence of data out. This format makes them especially privy to learning sequences of ordered data- such as text. Unlike their foundational counterpart- the simple recurrent neural network, LSTMs are capable of retaining important information and "forgetting" or dropping less important info. This characteristic is critical in dealing with long or complex sentences.

Use-case specific advantages of LSTMs:

  1. Lengthy Text Handling: Text such as reviews very drastically in length. In the case of length, detailed segments or long, drawn-out rants LSTMs are able to ensuring that important information from the beginning of a review is still considered when making a decision at the end. This is particularly critical as the review as a whole must be considered when classifying sentiment.

  2. Flexability/Effective: LSTMs can adapt to the varied styles and structures of college reviews, making them a reliable choice for this kind of sentiment analysis.

Model Architecture

The model architecture, combines pre-trained embeddings with bidirectional LSTMs and dropout layers, for capturing the complexities in student-written, conversational college reviews, providing a balance between learning from sequential data and preventing overfitting.

  1. Embedding Layer:

    • Utilizes pre-trained FastText embeddings.

    • Vocabulary size (input dim) of 10,000 and embedding dimension (output dim) of 300

    • Embedding matrix created from pre-trained weights

    • Input sequence length: 500

    • In this case, the embeddings were not further trained (they remained fixed during training)

  2. First LSTM Layer:

    • Bidirectional LSTM with 50 units

    • Learns from the input sequence both forwards and backwards to capture greater context

    • L2 Regularization used to reduce overfitting by penalizing large weights

  3. First Dropout Layer:

    • Randomly sets 20% of input units to 0 during training to prevent overfitting

  1. Capturing Context: The way words are used together in a sentence often determines their sentiment. LSTMs understand this context, which is key in interpreting the informal, pedestrian language of student reviews.

________________________________________________________________________

  1. Second LSTM Layer:

    • Second stacked bidirectional LSTM with 50 units

    • This layer processes the sequence output from the previous LSTM layer.

  2. Second Dropout Layer:

    • Similar to the first dropout layer, with a dropout rate of 20%.

  3. Dense Layer:

    • A dense layer with a single unit and a sigmoid activation function

    • Single neuron for binary classification

    • Output = probability indicating one of two classes

________________________________________________________________________

Like its preceding models, GPT-3.5 Turbo is trained on language modeling (i.e. predicting the next word in a sequence), however in addition, this model version also incorporated:

  • Zero-Shot Learning - Context prompt/instructions given with input

  • Single-Shot Learning - Context prompt & an example input/output pairing

  • Few-Shot Learning - Context prompt & multiple input/output examples (10-100)

Positive & Negative Review Summarization:

Goal: consolidate MANY reviews into 2 digestible summaries outlining the main Pro's and Con's identified across all of the reviews.

Zero-shot learning structure used to generate summaries (zero-shot used in order to reduce token usage due to limitations and cost)

Text Summarization | OpenAI API GPT-3.5 Turbo