How Programmers Relate based on Google Searches

The programmers search relations visualization is an interactive network graph showing the connections of programmers based on searches performed on Google. The graph consists of 561 nodes and 1311 edges. To be included a person's search result page on Google needed to have knowledge graph information indicating the person being a programmer.

Methodology

About 10 months ago Google introduced the knowledge graph to enrich their search results for a subset of search queries they receive. If you look for a person with knowledge graph information you may also see a People also search for box with related searches.

For this programmer network graph, the creators of the top programming languages according to GitHub served as seed values, i. e. Brendan Eich (JavaScript), Yukihiro Matsumoto (Ruby), James Gosling (Java), Guido van Rossum (Python), Stephen R. Bourne (Bourne shell), William Nelson Joy (C shell), Rasmus Lerdorf (PHP), Dennis Ritchie (C), Bjarne Stroustrup (C++), Larry Wall (Perl), and Brad Cox (Objective-C).

If a search result included knowledge graph information indicating the person is a programmer (or hacker or computer scientist) it was included as a node. Information on related searches is the basis for connections in this graph. A link between the current and a related search query was established if both of them were recognized as programmers.

After collecting the data, I generated a gexf file for further processing with the Gephi visualization platform. I applied Force Atlas 2 and label adjust layouts and sized the nodes based on their calculated Page Ranks — what else to use as a metric when looking at Google search data. To colorize nodes and edges I had Gephi calculate modularity and assign colors based on modularity classes. The interactive version is rendered with the JavaScript library sigma.js.

Interpretation of Results

First of all this graph certainly does not contain all programmers Google shows knowledge graph information for. Thus, no claim to completeness. Moreover, only the up to 5 related searches on the 1st search result page for a query where considered, even though there may be more related searches for a term.

When 1st looking at the data in Gephi I was surprised by the highest connected node, Shafi Goldwasser (degree 18), an MIT computer science professor I honestly never heard of before. Without wanting to play down her importance in the field I doubt that she is searched for far more frequently than Guido van Rossum (degree 6) for example. Thus, I conclude that neither degree nor PageRank in this graph are indicators for search frequency.

Looking at different communities you can make out in the graph is less surprising though. On the right colored in green you see many programmers that are involved with scripting languages and the Web as a platform, e. g. Larry Wall, Yukihiro Matsumoto, Guido van Rossum, Rasmus Lerdorf, and Brendan Eich.

The pink community in the lower center around Edsger W. Dijkstra is comprised of several pioneers in the computer science field, e. g. Donald Knuth, Grace Hopper, and John Backus.

On the top right the blue community around Robert Tappan Morris consists of programmers who were tempted by the dark side once or several times in their lives, a.k.a. computer security experts. Notable hacker examples you find here are Kevin Mitnick, Kevin Poulsen, and of course Bill Gates.

Summary

Google's knowledge graph, which among others is built on top of Wikipedia/Freebase data, is a rich source of information worth exploring in more detail. Regarding the programmers network I expected those that served as seed values to be more prominent and less high degree nodes that I never heard of before. The communities within the graph on the other hand, mostly make sense to me.

While gathering data for the next topic, I'm curious to learn about your interpretation of this network as well as feedback and suggestions.