Generating Big Spatial Data on Firm Innovation Activity from Text- Mined Firm Websites
Refereed Journal // 2018Innovation is one of the major drivers of economic growth, where spatial processes of knowledge spillover play a vital role. Current practices in assessing firms’ innovation activity, including patent analysis and questionnaires, suffer from severe limitations. In this paper, we propose a novel approach to estimate firms’ innovation activity based on the texts on their websites. We use an automated web-scraper to harvest text from the websites, then extract semantic topics in a self-learning, generative topic-modelling approach, and finally analyse these topics using an Artificial Neural Networks (ANN) method to assess each firm’s level of innovation. This procedure results in a large-scale dataset that will be used for further spatial economic analysis of the distribution of innovative firms and the processes that drive the development of innovation in firms.