Behind the Headlines

This website is the product of a huge amount of work by our team. We are a group of postgraduate students at the University of Edinburgh: Andy Ernst, Siwei Zhu, Liyuan Sun, Grace Forsyth, and Francesca Hearn-Yeates.

the National Museum of Scotland
The National Library of Scotland provided the Broadsides dataset to us as a collection of XML files and scanned images. We have converted, processed, and analysed this dataset to create a searchable and explorable website allowing this fascinating piece of Scottish history to become accessible to a wide audience.

The flow diagram below details our developmental process:

Natural Language Processing (NLP) of text
The original Broadsides files were scanned and digitised using OCR. These text files were our primary dataset. We used NLP to analyse the content of the text, such as frequency of words and locations.
Conversion of XML files to CSV
The metadata of our text files, such as publication year and location, existed in XML format. We wrote the code to convert the metadata to a CSV to make the analysis easier later on.
Cleaning the metadata
The metadata CSV file contained a number of errors, such as punctuation and spelling mistakes, which occurred during the OCR processing. These were removed and replaced to improve the analysis.
Extracting the themes
Through our analysis of the text we found four key themes: murder, trials, courtship, and songs and poems. We investigated these topics further to see how they varied throughout the Broadsides.
Coding the website
We built a single page web application using React and Typescript. The map uses Leaflet with broadside locations geolocated using Google Maps and is deployed via Netlify.
Designing website aesthetics
Due to the period of broadsides, we designed the website by combining the old fashion style and newspapers. We created the introductory cartoon using Figma, Photoshop and Adobe Illustrator.
Writing text to describe & explain findings
Finally, the website was completed by writing descriptions of the themes and the visualisations to summarise our findings and clarify the messages that we want to share.