Using word2vec to see what CBC 'knows' about Canada

by Roberto Rocha

Machine Learning & Data Science

What does a decade of news stories published on cbc.ca tell us about Canada? What words and ideas are associated with different cities, provinces, and public figures? Can it tell us who is Montreal's Drake or what is the Vancouver equivalent of poutine? Can it reveal unconscious biases? Are certain words more associated with the word 'man' than the word 'woman'? With 'black' versus 'white', 'indigenous', or 'immigrant'? In this talk, I'll show how I trained a neural word embedding model with hundreds of thousands of news stories using the gensim library and explored the word associations through a Jupyter notebook.


About the Author

Roberto is an investigative data journalist at the CBC who traded Excel for Jupyter notebooks years ago and has never looked back. His goals are to master NLP and network graphs in the service of journalism.


Talk Details

Date: Saturday Nov. 16

Location: Round Room (PyData Track)

Begin time: 10:30

Duration: 25 minutes