The project’s objective is to sustainably improve the research data infrastructure in Germany by using existing databases to generate new Scientific-Use-Files by means of linkage, on the one part. These databases are, for example, company data of official statistics and other sources, or survey data collected in the framework of publicly funded research projects.
On the other part, new data sources shall be exploited from the internet. Digitalization has entered into most of the areas of life and business. Communication and data exchange largely takes place digitally. Thereby large quantities of data are generated with differing degrees of automatization and structuring as well as heterogenic formats, which are also of interest for research.
The overall goal within “Science Data Center”, a project sponsored by the state of Baden-Württemberg, is the establishment of a competence center for data availability and analysis in economics (Business and Economic Research Data Center, BERD-Center). One focus of BERD lies in unstructured, mostly large data sets (“Big Data”), which are extracted by web scraping from the internet and are supposed to be prepared for data analysis. A further point of focus is concerned with the linkage of company data bases – structured (e.g. company micro-data of official statistics) as well as unstructured web-based data - and the generation of a common company identifier.
For this purpose, a project consortium was formed by the Mannheim Center for Data Science (MCDS) by University of Mannheim, the Leibniz Centre for European Economic Research (ZEW) and the infrastructure facilities of the University Library Mannheim (UBM) and the computer center of University of Mannheim (RUM).
The BERD-Center executes the following tasks:

  1. Provision of the generated research data for other scientists – as far as there is no legal objection,
  2. Clarification of data protection and data privacy issues regarding the circulation of web-based data,
  3. Cataloguing, documentation and provision of the project’s meta data,
  4. Consulting and information services on data collection methods from the internet and on the evaluation of large data volumes,
  5. Concepts and measures for education and training in the area of “Data Science” for researchers in economics.

For all services, particular attention will be paid on the sustainable availability. The competence center shall be established beyond the funding phase.

Selected Publications

Articles in Refereed Journals

Schmidt, Sebastian, Jan Kinne, Sven Lauterbach, Thomas Blaschke, David Lenz and Bernd Resch (2022), Greenwashing in the US metal industry? A novel approach combining SO2 concentrations from satellite data, a plant-level firm database and web text mining,, Science of The Total Environment Volume 835, 155512, ISSN 0048-9697. Download

Discussion and Working Papers

Eugenidis, Dania, Jan Kinne and David Lenz (2022), Analysing Gender Equality at the Firm Level, MAGKS Papers on Economics 14-2022, Marburg. Download