One of the methods we employed to study the differences between Marine Le Pen's and Jean-Marie Le Pen's rhetoric was n-grams. Briefly, n-grams are N co-occuring words in a corpus. If N = 2, we call them bi-grams; three words are tri-grams, and so on. More information about ngrams is available on this What are N-Grams? page from "Text Mining & Analytics 101."
For the attached visualization, we divided our corpora by author and by genre, removed high frequency, but low semantic value words (i.e., stop words), then looked for the most common tri-grams. This analysis allowed us to determine if there was a difference in the rhetoric of the two politicians and if there were differences among genres (radio interviews, television appearances, etc.). After generating lists of trigrams and their frequencies, we then converted them into a network graph using Gephi, so that we could see how the different trigrams clustered by genre and by author. The network is a bi-modal graph, which means it contains nodes (the circles on the graph) of two types: genre and trigram. The genres are labeled by author and genre. For example, "MLP RADIO" indicates radio interviews given by Marine Le Pen. The lines connecting different nodes indicate that a trigram appears in the connected genre. Few trigrams appear in more than one genre, but those that do give the network its structure.
A few things stand out immediately. First, Marine Le Pen's speeches, interviews, and writing cluster together at the top of the graph and indicate that, regardless of genre, her rhetoric as captured by common three-word phrases is clearly distinguishable from Jean-Marie Le Pen's. Moreover, we get a sense of the overall composition of our corpora. While we have a good mix of genres from Marine Le Pen, our Jean-Marie Le Pen corpora are dominated by the "discours" genre, to the detriment of radio and television.
Attached is a high resolution PDF if you would like to explore this network further.