Web-based innovation indicators may provide new insights into firm-level innovation activities. However, little is known yet about the accuracy and relevance of web-based information for measuring innovation. In this study, we use data on 4,487 firms from the Mannheim Innovation Panel (MIP) 2019, the German contribution to the European Community Innovation Survey (CIS), to analyze which website characteristics perform as predictors of innovation activity at the firm level. Website characteristics are measured by several data mining methods and are used as features in different Random Forest classification models that are compared against each other. Our results show that the most relevant website characteristics are textual content, the use of English language, the number of subpages and the amount of characters on a website. In our main analysis, models using all website characteristics jointly yield AUC values of up to 0.75 and increase accuracy scores by up to 18 percentage points compared to a baseline prediction based on the sample mean. Moreover, predictions with website characteristics significantly differ from baseline predictions according to a McNemar test. Results also indicate a better performance for the prediction of product innovators and firms with innovation expenditures than for the prediction of process innovators.
Axenbeck, Janna and Patrick Breithaupt (2021), Innovation Indicators Based on Firm Websites: Which Website Characteristics Predict Firm-Level Innovation Activity?, PLOS ONE Volume 16, Issue 4. Download