OxTalks is Changing
On 28th November OxTalks will move to the new Halo platform and will become 'Oxford Events' (full details are available on the Staff Gateway).
There will be an OxTalks freeze beginning on Friday 14th November. This means you will need to publish any of your known events to OxTalks by then as there will be no facility to publish or edit events in that fortnight. During the freeze, all events will be migrated to the new Oxford Events site. It will still be possible to view events on OxTalks during this time.
If you have any questions, please contact halo@digital.ox.ac.uk
Collecting disparate data online by web scraping using Selenium and BeautifulSoup in Python
This session will take place on Microsoft Teams
Ideally the data we wish to work on can be downloaded in an easy to use format. Otherwise when we want only a small subset of a very big dataset, or the data is being constantly updated, hopefully the owner will provide an application programming interface (API) to automate the collection of the relevant data. However quite often the data cannot be downloaded and there is no API, but the data is publicly available, just dispersed across a website. When it would be too tedious and time consuming to navigate page by page to collect the data manually; we can use Selenium Webdriver and Beautiful Soup to automate navigating across the website and collecting of the relevant data. In this code clinic, I will go through the best practices (and what not to do!) when web scraping; using Selenium Webdriver to navigate around a website and then using Beautiful Soup to extract the data from the HTML.
Date:
29 April 2020, 11:00
Venue:
Venue to be announced
Speakers:
Speaker to be announced
Organiser:
Sarah Laseke (Big Data Institute)
Organiser contact email address:
sarah.laseke@ndph.ox.ac.uk
Booking required?:
Required
Booking url:
https://oxford.onlinesurveys.ac.uk/python-code-clinic-29-april
Booking email:
sarah.laseke@ndph.ox.ac.uk
Audience:
Members of the University only
Editor:
Sarah Laseke