About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and thesis (masters) data from an online database at the University of Minnesota. In addition to describing data from masters theses, I've collected the most recent data on dissertations to provide an update on my previous post. I've avoided presenting the R code for brevity, but I invite interested readers to have a look at my Github repository where all source code and data are stored. Also, please, please, please note that I've since tried to explain that dissertation length is a pretty pointless metric of quality (also noted here), so interpret the data only in the context that they’re potentially descriptive of the nature of each major.

Feel free to fork/clone the repository to recreate the plots. The parsed data for theses and dissertations are saved as .RData files, 'thes_parse.RData' and 'diss_parse.RData', respectively. Plots were created in 'thes_plot.r' and 'diss_plot.r'. The plots comparing the two were created in 'all_plo.r'. To briefly summarize, the dissertation data includes 3037 records from 2006 to present. This differs from my previous blog by including all majors with at least five records, in addition to the most current data. The masters thesis data contains 930 records from 2009 to present. You can get an idea of the relative page ranges for each by taking a look at the plots. I've truncated all plots to maximum page ranges of 500 and 250 for the dissertation and thesis data, as only a handful of records exceeded these values. I'm not sure if these extremes are actually real data or entered in error, and to be honest, I'm too lazy to verify them myself. Just be cautious that there are some errors in the data and all plots are for informational purposes only, as they say…


Fig: Number of doctoral dissertations in the database by major.

Fig: Number of masters theses in the database by major.

Fig: Summary of page lengths of doctoral dissertations by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.

Fig: Summary of page lengths of masters theses by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.

Fig: Distributions of page lengths for all records separated as dissertations or theses.

Fig: Comparison of dissertation and thesis page lengths for majors having both degree programs in the database. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers.


36 thoughts on “Average dissertation and thesis length, take two

  2. Oh God, just reading through the content on this blog post hurt my head! Really well together post though, how long did it take for you to compile all of that data once you got your code up and running?

    And I totally agree, quality over quantity in regards to the length of your dissertation.

    • Hi Waseem, the code takes about 15 minutes or so for the dissertation data, much less than that for the thesis data. Pretty quick, but then again there’s only a couple thousand records, which is not much by ‘big data’ standards.

      Thanks for reading!

  4. Well “Big Data” is more about structure and what you do with it than size. True you mightn’t have tackled issues such as creating Hadoop clusters, but from a data science perspective the principals are the same. Fabulous read, yes thesis length analysis mightn’t be a useful measure but you have demonstrated some good analysis skills here.

  5. This was a great read. Easy question for you: How many WORDS in the average dissertation? You’ve said previously that the formatting will cause a typical dissertation page to have perhaps only half the words of a typical published page…Could you put some numbers with those? Thanks?

  6. This is great and very useful, thank you. I’m reading this assuming that page counts include references and appendices? E.g. my masters thesis was 113 pages, but only about 80 of them were content.

    • Yep, all pages were included. I think many people don’t realize that a big chunk of the thesis is in the appendix! Word count would have been a better metric but that info wasn’t in the database I used.

  10. Interesting blog, however your color coding is a) redundent (it duplicates the Y axis value) and b) confusing as it leads the reader to associate one graph with the other (e.g. that the discipline with the most dissertations also have the most verbose ones.

    I recommend you read Tufte on the visual representation of data.

  13. I totally second your thought about dissertation lenght having nothing to do with its quality but still every college guidelines mention that a dissertation should be 200- or more pages long. Actually, I haven’t written any dissertation. At the moment I’m writing my master thesis but my supervisor is always emphasizing my paper should have at least 100 pages but it would be even nicer if there are more. I just believe that in the end 50% of my research will just stating of other researchers’ data. And I wanted to do my own research and though it may not meet the requirement for a certain number of pages but it will be an interesting research and not just restatement of facts in another words. And in the end my motivation level is dropping. Motivation tips and inspiration posts aren’t helping at all.I just believe my supervisor is in the way of my research.

    • My advisor once told me about a thesis that was three pages long. Some guy/gal in analytical chemistry had worked a few years to develop some molecular equation that ended up being the whole thesis. I’ve heard from more than one faculty member that quality >> quantity and I have never heard of an instance where a college suggests a minimum page length. I think that’s a stupid concept that totally undermines the process. The thesis is always under your control no matter what anyone says.

  16. So, extra credit question … and just as controversial as page length in terms of quality >> quantity. What is the average number of REFERENCES for each discipline?

    I am starting to write my dissertation now and I am hovering around 60. One of my friends had over 350!! I am starting to feel like an underachiever.

    • Yea that’s a good question. I would guess that most research papers in the primary literature are in the 25-50 range, so maybe scale that up for chapters in the dissertation? But there are no set rules, provide the info you think is most necessary. It would be possible to scrape the info from dissertations but would require some more involved text mining.

