Average dissertation and thesis length, take two

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and thesis (masters) data from an online database at the University of Minnesota. In addition to describing data from masters theses, I've collected the most recent data on dissertations to provide an update on my previous post. I've avoided presenting the R code for brevity, but I invite interested readers to have a look at my Github repository where all source code and data are stored. Also, please, please, please note that I've since tried to explain that dissertation length is a pretty pointless metric of quality (also noted here), so interpret the data only in the context that they’re potentially descriptive of the nature of each major.

Feel free to fork/clone the repository to recreate the plots. The parsed data for theses and dissertations are saved as .RData files, 'thes_parse.RData' and 'diss_parse.RData', respectively. Plots were created in 'thes_plot.r' and 'diss_plot.r'. The plots comparing the two were created in 'all_plo.r'. To briefly summarize, the dissertation data includes 3037 records from 2006 to present. This differs from my previous blog by including all majors with at least five records, in addition to the most current data. The masters thesis data contains 930 records from 2009 to present. You can get an idea of the relative page ranges for each by taking a look at the plots. I've truncated all plots to maximum page ranges of 500 and 250 for the dissertation and thesis data, as only a handful of records exceeded these values. I'm not sure if these extremes are actually real data or entered in error, and to be honest, I'm too lazy to verify them myself. Just be cautious that there are some errors in the data and all plots are for informational purposes only, as they say…

-Marcus


Fig: Number of doctoral dissertations in the database by major.



Fig: Number of masters theses in the database by major.



Fig: Summary of page lengths of doctoral dissertations by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.



Fig: Summary of page lengths of masters theses by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.



Fig: Distributions of page lengths for all records separated as dissertations or theses.



Fig: Comparison of dissertation and thesis page lengths for majors having both degree programs in the database. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers.


About these ads

5 thoughts on “Average dissertation and thesis length, take two

  1. How long is the average dissertation? – R is my friend

  2. Oh God, just reading through the content on this blog post hurt my head! Really well together post though, how long did it take for you to compile all of that data once you got your code up and running?

    And I totally agree, quality over quantity in regards to the length of your dissertation.

    • Hi Waseem, the code takes about 15 minutes or so for the dissertation data, much less than that for the thesis data. Pretty quick, but then again there’s only a couple thousand records, which is not much by ‘big data’ standards.

      Thanks for reading!

  3. Science | Pearltrees

  4. Well “Big Data” is more about structure and what you do with it than size. True you mightn’t have tackled issues such as creating Hadoop clusters, but from a data science perspective the principals are the same. Fabulous read, yes thesis length analysis mightn’t be a useful measure but you have demonstrated some good analysis skills here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s