Master - Lab - Automatic Extraction and Completeness Detection of Statistical information

Supervisor: Anna-Marie (

To conduct meta-analyses (for an example, see [1]) in Usable Security and Privacy (USECAP), statistical information, such as participant numbers, effect sizes, and results from statistical hypothesis tests need to be extracted from publications and assessed for completeness manually, which is quite time consuming. While there are tools for consistency checking results of statistical tests [2] when information is reported correctly according to APA, in USECAP, reporting of statistics is not always complete, or matches APA criteria. In a prior student work, a wide range of regex patterns to identify statistical concepts in scientific text were developed. In this lab, your focus would be on automatically identifying information belonging to a limited number of types of hypothesis tests and assessing the completeness of this information.

Your task

  • Building on the prior student work, and a paper on extracting methodological information from scientific papers [3], identify statistical information and group it according to incidences of statistical hypothesis tests
  • For each identified hypothesis test, determine whether the reported information is complete enough for meta-analysis

It is possible to do this lab as a group.

Literature to start with


  • You should have basic knowledge about inferential statistical testing on the level of the Bachelor-course “Usable Security and Privacy”. If necessary, we can provide English slides and German videos for self-study.