What is this page about?
This webpage contains three demo implementations of designs to explore and analyze word vector embeddings. They support exploring neighborhoods, projecting words onto concept axes, and viewing word co-occurrences reconstructed from the embedding model. You can get to each of the demos by clicking on the link in the navigation bar on top of this page. For more information about the implementations, the ideas behind them, and videos describing the interface and use cases, pleae visit our project webpage. Additional information is available in our paper presented at EuroVis 2018.
Download Embeddings
We provide 22 example embeddings in the right format, ready to use and explore with our implementations:
- 1800-1990: The HistWords embeddings created by Hamilton et al. We use their "All English" data set. Those embeddings are trained on Google n-grams for each decade from 1800-1990, resulting in 20 embeddings (download).
- EEBO_TCP: Word embedding trained with GloVe on the historic EEBO-TCP corpus. This embeddings is trained with a window size of 15, and a minimum word count of 5 (download). The dimensionality of this embedding is 50, which we have chosen to reduce memory requirements of our online demo. While such an embedding works reasonably well for demonstration purposes, high quality embeddings used in production environments usually have between 200 to 300 dimensions.
- wiki1-5: Word embeddings trained with GloVe on the entire English Wikipedia. All five embeddings are trained with a window size of 15, and a minimum word count of 5 (download). The dimensionality of these embeddings is 50, which we have chosen to reduce memory requirements of our online demo. While those embeddings work reasonably well for demonstration purposes, high quality embeddings used in production environments usually have between 200 to 300 dimensions.
Cool! Can I use this for my own data?
If you are interested in analyzing your own embeddings, you can either directly work with our code available on GitHub. Alternatively, a docker image that runs the implementations on a web server is available. More information in how to set up and run the image is available here.
Citation
If you use our implementations or our code for your own project, please cite us as follows:@Article{HG18, author = {Heimerl, Florian and Gleicher, Michael}, title = {Interactive Analysis of Word Vector Embeddings}, journal = {Computer Graphics Forum}, number = {3}, volume = {37}, month = {jun}, year = {2018}, projecturl = {http://graphics.cs.wisc.edu/Vis/EmbVis}, url = {http://graphics.cs.wisc.edu/Papers/2018/HG18} }