- Data -

Rate My Professor | University Ratings & Reviews

Rate, my professor is a well known website that allows students to leave anonymous reviews for courses and professors. The site rows to popularity among college students, seeking insight and direction when registering for courses. The website hosts an impressive breadth of universities and colleges, ranging from small town, community colleges, to large public universities. Although, as indicated, in its name, read, my professor is primarily thought of as a teacher, evaluation forum, it hosts an institution-wide ratings page for each school as well. For NLP, exploration, these university reviews are full of valuable data.

While, on an individual basis, ratings can fluctuate, depending on the individuals, enthusiasm, personal preferences, and the subjective emotion at the time of posting. However, many of these pages contain hundreds of reviews, that when viewed as a whole can offer valuable insight into the first-hand experience that universities have to offer. Unfortunately, ratemyprofessor.com does not have a built-in, API, however, with inspiration from various data-gathering projects from GitHub, a custom method was created for gathering university reviews from the school pages.

TRAINING:

In addition to the textual, written review, the university reviews contain a scoring system that judges the school on 10 categories (such as food, social, clubs, facilities, etc.) the method fetches all 10 ratings, which are used to generate an overall rating for each review. For training, reviews for 110 colleges (ranging in size, location and institution type) were scraped from the website. The review ratings (decimal values ranging from 0 to 5) were used to label the readings as either positive or negative. Threshold for classifying, the ratings were experimented with, and ultimately reviews with a rating of three or lower, were classified as “negative” and over 3 were classified as “positive.”

The data was pre-processed by encoding the binary labels in their numeric form of zeros and ones, and the text reviews were transformed into numeric form using a Keras tokenize with padding to equal size sequences.

DISCLAIMER: The following project is purely for demonstrative purposes in an academic setting. Therefore any use of web scraping or borrowed data will not be published for use or profit by others.

The College Scorecard database contains data on colleges all across the US. The data covers 5 main categories: cost, graduation rate, employment rate, average amount borrowed, and loan default rate. It is updated regularly based on federal reporting from institutions. This makes it a treasure trove for data analyses and modeling. The U.S. Department of Education also leverages this data to provide a college filtering web-app tool. With a similar mission in mid, the College Scorecard was created to provide prospective students with comprehensive statistics on college performance and attributes.

Key features of College Scorecard data include:

  1. Educational Outcomes: graduation rates, retention rates, percentage of transfer students, etc.

  2. Financial Information: average cost of attendance, financial aid, student debt, etc.

  3. Post-Graduation Earnings: median earnings of former students, income at incremental post-graduate states, etc.

  4. Field of Study Information: program-specific graduate success, earnings, debt, etc.

CollegeScorecard | College Meta Data & Statistics Database

For the "College Explorer" dashboard the following attributes were selected:

Mission Lead Attributes:

  • Historically Black College & University

  • Alaska Native Native Hawaiian serving institution

  • Tribal College & University

  • Asian American Native American Pacific Islander-Serving Institution

  • Hispanic-serving institution

  • Native American non-tribal institution

  • Men-Only

  • Women Only

  • Religiously Affiliated

Location Attributes:

  • State

  • Urbanization Level (City, Suburb, Rural, etc.)

Institution Attributes:

  • Ownership (Private vs Public)

  • Degree Levels

  • Acceptance Rate

  • Annual Tuition

  • Average SAT Score

The US Department of Education makes the College Scorecard data extremely accessible through a database API. There is full documentation on how to leverage university data via the Open Data Maker HTTP API GitHub. In this way, the "College Explorer" web app dashboard was used to gather filters to then query data from the API. Below is the endpoint url that is used: