Imagine that we have 2 research papers: one is on mathematics and the other is on astrophysics. Which is better?
It’s subjective, of course. And while arXiv users from different tribes may have already made their minds up on the matter, let’s see if we can make an objective judgment using citation data.
We might try to measure the impact that each paper has had by simply comparing their citation counts. Unfortunately, that’s not terribly fair. Different fields of research have different citation rates. So a large and fast-moving field like astrophysics might have high citation rates and mathematics, which is smaller and progresses more slowly, might have low citation rates.
It might be helpful if we could standardise citation-counts by field so that we can make an apples-with-apples comparison.
I’m going to look at an approach to this based on topic modelling. If you’ve never heard of topic modelling, I’ve added a brief introduction in an appendix.
I’ve started by creating a 100-topic model of ArXiv. Once we have such a model, each of our articles will have one ‘primary topic’ which is most closely associated with it. If we assign each article to its primary topic, it’s a lot like grouping our articles into their respective fields.
Now we can compare the citation rates of these fields.
Interpreting the model
- Look at the right side of the x-axis. We can see that topics 33, 40, 54 have the lowest mean citations. If we scroll down to our topic model in the appendix, we can see that these topics are associated with mathematics.
- At the left side of the x-axis we see topics 42, 18, 62 have high citations. These topics appear to be associated with astrophysics.
- This is in line with our expectations; we already knew that maths had lower citation rates than astrophysics.
- But it illustrates an important point: we can’t compare astrophysics papers with mathematics papers. An average astrophysics paper in this dataset might have ~30 citations and would therefore be considered a highly-cited outlier if it was compared with a distribution of maths papers where the average paper has only ~2 citations.
Levelling the field
It’s very straightforward to take a model like this and standardise it.
Here, I have used SKLearn’s RobustScaler. This gives us new values for citation-counts for each article which are more easily compared. Now an average astrophysics paper is no better cited than an average mathematics paper, but outliers in either field are easy to spot.
Now we can make a much fairer comparison of maths papers and astrophysics papers!
Some things to consider
- We might get different results by tweaking the parameters of our topic model.
- It’s also worth bearing in mind that, if documents are indeed made of topics, then by assigning each article to just 1 topic, we are perhaps losing important data about the other topics which make up each article.
- CrossRef citation counts are incomplete. As far as I know, CrossRef do not count citations to preprints, so if a field tends to have long peer-review times (like maths!) then it will be a potential source of bias since a larger proportion of citations will be accrued before journal publication. Citation counts for a field like that will be biased low for that field. It would be worth performing this scaling with more complete citation data.
- Recall that, when we looked at rejected article tracking, we saw that our ability to link preprints to their published articles is imperfect.
- There are numerous other ways to standardise distributions.
- There are also other approaches to this standardisation problem that are not based on topic modelling. An obvious recent example would be the Relative Citation Ratio (RCR). Digital Science calculate the RCR and make it available through their Dimensions website & API.
I hear that getting a job in academia is tough. There is a huge amount of competition for jobs and not enough of them to go around. It’s disappointing to learn that researchers applying for jobs are often judged on citation counts, h-indices and even journal impact factors. These are not generally valid measures of achievement or ability and can be taken out of context.
Context matters!
APPENDIX
Topic modelling
If topic modelling is unfamiliar to you, it goes something like this:
- take some documents — in this case titles and abstracts of ArXiv preprints from 2012.
- Try to find groups of words that tend to appear together inside documents.
- For example: we might find that the words ‘black’, ‘hole’, ‘theory’, ‘horizon’, ‘entropy’, and ‘thermodynamics’ all tend to appear together in papers about black hole theory.
- We can then infer that a paper is about black hole theory even if it just has the words ‘horizon’, ‘entropy’, and ‘thermodynamics’ in it.
These groups of words are called ‘topics’. The underlying assumption of a topic model is that topics are made of words and documents are made of topics.
So, our topic model will be built using ArXiv’s 2012 content. The full model is shown below, but here’s a snippet:
(Topic 41, ‘0.033*”mass” + 0.028*”model” + 0.022*”higg” + 0.014*”coupl” + 0.013*”standard_model” + 0.012*”decai” + 0.012*”new” + 0.011*”gev” + 0.011*”higg_boson” + 0.011*”ar”’),
(Topic 42, ‘0.043*”halo” + 0.028*”profil” + 0.021*”satellit” + 0.018*”simul” + 0.016*”stream” + 0.015*”tidal” + 0.012*”escap” + 0.012*”galact” + 0.012*”milki_wai” + 0.011*”veloc”’),
(Topic 43, ‘0.084*”black_hole” + 0.053*”binari” + 0.030*”hole” + 0.027*”mass” + 0.024*”neutron_star” + 0.023*”omega” + 0.017*”extrem” + 0.017*”bubbl” + 0.015*”horizon” + 0.014*”collaps”’),
If you’re familiar with physics, it should be clear that Topic 41 comes from particle physics, Topic 42 is something to do with galaxy formation and Topic 43 looks like numerical relativity.
Full topic model
This model was created using Gensim’s LDA module.
Some words will appear shortened, this is due to ‘stemming’ where we reduce the number of unique words in our set of documents by converting ‘force’, ‘forcing’, ‘forced’ all into the one ‘stem’ word ‘forc’. This should mean that there are fewer synonyms in our dataset, but the downside is that sometimes those different versions of a word do matter (consider the different context of those 3 words in physics).
You will also see some ‘ngrams’, i.e. sequences of words which we treat as one word. The canonical example is ‘New York’ which is 2 words that we treat as one word. LDA ignores word-ordering if we don’t do this. Below, you can see some examples where ngrams are detected and joined with an underscore, like ‘neutrino_mass’ or ‘liquid_crystal’.
[(0, ‘0.268*”function” + 0.041*”weight” + 0.026*”seri” + 0.019*”delta” + 0.018*”branch” + 0.018*”valu” + 0.017*”exponenti” + 0.017*”sum” + 0.013*”ar” + 0.013*”zero”’),
(1, ‘0.056*”correct” + 0.040*”factor” + 0.032*”calcul” + 0.030*”contribut” + 0.020*”effect” + 0.016*”ar” + 0.014*”result” + 0.012*”order” + 0.011*”product” + 0.010*”lead”’),
(2, ‘0.092*”mix” + 0.072*”neutrino” + 0.029*”hierarchi” + 0.018*”neutrino_mass” + 0.018*”heavi” + 0.013*”ar” + 0.012*”dirac” + 0.011*”mass” + 0.010*”nu” + 0.009*”flavor”’),
(3, ‘0.028*”manifold” + 0.028*”surfac” + 0.021*”singular” + 0.017*”point” + 0.015*”geometri” + 0.015*”geometr” + 0.015*”complex” + 0.013*”ar” + 0.012*”thi” + 0.011*”smooth”’),
(4, ‘0.070*”site” + 0.052*”structur” + 0.048*”distort” + 0.046*”protein” + 0.021*”bind” + 0.020*”hexagon” + 0.012*”mn” + 0.012*”earthquak” + 0.011*”residu” + 0.010*”avalanch”’),
(5, ‘0.185*”index” + 0.036*”squeez” + 0.030*”nemat” + 0.029*”len” + 0.025*”braid” + 0.022*”red” + 0.019*”groupoid” + 0.018*”blue” + 0.017*”liquid_crystal” + 0.017*”transact”’),
(6, ‘0.303*”metric” + 0.029*”cp” + 0.024*”ch” + 0.018*”squar” + 0.016*”lorentzian” + 0.014*”r_d” + 0.013*”distanc” + 0.012*”factori” + 0.012*”magic” + 0.012*”curvaton”’),
(7, ‘0.092*”forward” + 0.063*”front” + 0.046*”backward” + 0.036*”mm” + 0.032*”ep” + 0.019*”ne” + 0.017*”ilc” + 0.016*”12c” + 0.013*”cdot” + 0.012*”ture”’),
(8, ‘0.133*”imag” + 0.050*”object” + 0.038*”reconstruct” + 0.024*”bodi” + 0.023*”us” + 0.019*”shape” + 0.015*”segment” + 0.012*”techniqu” + 0.009*”filter” + 0.009*”wavelet”’),
(9, ‘0.121*”decomposit” + 0.114*”tensor” + 0.047*”fusion” + 0.046*”discontinu” + 0.032*”mesh” + 0.029*”finit_element” + 0.024*”multiscal” + 0.020*”su” + 0.017*”decompos” + 0.015*”cube”’),
(10, ‘0.093*”popul” + 0.057*”speci” + 0.054*”spread” + 0.033*”model” + 0.026*”persist” + 0.016*”infect” + 0.016*”diseas” + 0.013*”epidem” + 0.013*”competit” + 0.012*”twin”’),
(11, ‘0.019*”chiral” + 0.014*”fermion” + 0.014*”ar” + 0.014*”model” + 0.013*”mass” + 0.013*”symmetri” + 0.013*”quark” + 0.011*”effect” + 0.010*”break” + 0.010*”qcd”’),
(12, ‘0.068*”particl” + 0.045*”wave” + 0.021*”ar” + 0.019*”instabl” + 0.019*”plasma” + 0.018*”propag” + 0.017*”veloc” + 0.015*”densiti” + 0.014*”potenti” + 0.012*”effect”’),
(13, ‘0.036*”network” + 0.015*”us” + 0.014*”data” + 0.013*”ar” + 0.010*”thi” + 0.009*”system” + 0.009*”research” + 0.009*”inform” + 0.008*”applic” + 0.008*”thi_paper”’),
(14, ‘0.074*”sequenc” + 0.057*”select” + 0.026*”gene” + 0.019*”genet” + 0.016*”genom” + 0.013*”plate” + 0.013*”mutat” + 0.012*”divers” + 0.012*”void” + 0.010*”popul”’),
(15, ‘0.046*”pt” + 0.036*”hermitian” + 0.029*”plot” + 0.025*”fault” + 0.024*”antenna” + 0.021*”harvest” + 0.017*”rb” + 0.016*”st” + 0.016*”ptsymmetr” + 0.015*”nonhermitian”’),
(16, ‘0.157*”curv” + 0.073*”compress” + 0.050*”elast” + 0.038*”stress” + 0.037*”strain” + 0.032*”shear” + 0.024*”modulu” + 0.020*”load” + 0.017*”moduli” + 0.017*”kappa”’),
(17, ‘0.057*”fluid” + 0.042*”turbul” + 0.037*”vortex” + 0.031*”vortic” + 0.026*”flow” + 0.025*”dissip” + 0.017*”viscos” + 0.016*”granular” + 0.016*”cascad” + 0.015*”veloc”’),
(18, ‘0.057*”galaxi” + 0.047*”cluster” + 0.017*”mass” + 0.014*”ar” + 0.012*”sampl” + 0.012*”star_format” + 0.011*”redshift” + 0.009*”us” + 0.008*”we_find” + 0.008*”thi”’),
(19, ‘0.178*”stabil” + 0.115*”stabl” + 0.030*”tail” + 0.027*”unstabl” + 0.025*”secondord” + 0.020*”second_order” + 0.014*”perturb” + 0.013*”opinion” + 0.013*”critic” + 0.012*”consensu”’),
(20, ‘0.059*”period” + 0.047*”frequenc” + 0.033*”observ” + 0.031*”variabl” + 0.025*”time” + 0.017*”dure” + 0.017*”radio” + 0.016*”arrai” + 0.016*”flare” + 0.016*”chang”’),
(21, ‘0.066*”field” + 0.026*”spacetim” + 0.024*”graviti” + 0.016*”thi” + 0.015*”gravit” + 0.014*”theori” + 0.014*”relativist” + 0.014*”vacuum” + 0.013*”effect” + 0.013*”ar”’),
(22, ‘0.035*”channel” + 0.017*”node” + 0.016*”rate” + 0.015*”network” + 0.013*”scheme” + 0.012*”ar” + 0.012*”receiv” + 0.012*”perform” + 0.012*”transmiss” + 0.011*”propos”’),
(23, ‘0.069*”scale” + 0.055*”chain” + 0.053*”model” + 0.044*”loop” + 0.031*”expon” + 0.024*”defect” + 0.017*”conform” + 0.015*”polym” + 0.015*”ar” + 0.014*”critic”’),
(24, ‘0.123*”temperatur” + 0.057*”thermal” + 0.034*”heat” + 0.022*”increas” + 0.020*”cool” + 0.016*”decreas” + 0.011*”effect” + 0.010*”ic” + 0.010*”crystal” + 0.009*”at_low”’),
(25, ‘0.078*”method” + 0.069*”problem” + 0.030*”approxim” + 0.017*”thi” + 0.017*”solv” + 0.016*”approach” + 0.015*”us” + 0.014*”numer” + 0.013*”ar” + 0.012*”set”’),
(26, ‘0.034*”rotat” + 0.031*”magnet_field” + 0.018*”solar” + 0.016*”magnet” + 0.016*”ar” + 0.015*”observ” + 0.009*”model” + 0.009*”shock” + 0.009*”field” + 0.008*”structur”’),
(27, ‘0.070*”graph” + 0.029*”set” + 0.024*”number” + 0.020*”edg” + 0.018*”tree” + 0.016*”vertic” + 0.013*”thi” + 0.013*”degre” + 0.013*”ar” + 0.012*”color”’),
(28, ‘0.099*”bar” + 0.018*”qso” + 0.017*”median” + 0.016*”21cm” + 0.014*”loos” + 0.013*”euler_characterist” + 0.012*”smg” + 0.012*”p_q” + 0.011*”isotropi” + 0.010*”ast”’),
(29, ‘0.076*”coher” + 0.062*”puls” + 0.052*”transfer” + 0.043*”molecul” + 0.035*”excit” + 0.029*”synchron” + 0.019*”filter” + 0.011*”molecular” + 0.011*”vibrat” + 0.011*”polariton”’),
(30, ‘0.081*”water” + 0.042*”24n” + 0.041*”depth” + 0.039*”c24" + 0.033*”cs” + 0.027*”27d24" + 0.024*”textit” + 0.017*”245ceta24" + 0.016*”polariz” + 0.015*”pd”’),
(31, ‘0.019*”survei” + 0.013*”us” + 0.013*”ar” + 0.013*”data” + 0.011*”observ” + 0.011*”dust” + 0.011*”telescop” + 0.010*”sourc” + 0.009*”distanc” + 0.008*”present”’),
(32, ‘0.053*”pattern” + 0.031*”neuron” + 0.029*”wall” + 0.028*”big” + 0.025*”spike” + 0.025*”domain_wall” + 0.024*”input” + 0.024*”s24" + 0.021*”neural_network” + 0.020*”train”’),
(33, ‘0.124*”conjectur” + 0.044*”twist” + 0.030*”number” + 0.027*”modular” + 0.022*”prove” + 0.020*”odd” + 0.017*”24e” + 0.014*”arithmet” + 0.013*”toric” + 0.011*”refin”’),
(34, ‘0.412*”state” + 0.046*”fraction” + 0.019*”topolog” + 0.017*”edg” + 0.014*”ar” + 0.013*”bound” + 0.011*”degeneraci” + 0.011*”quantum_hall” + 0.007*”majorana” + 0.007*”box”’),
(35, ‘0.020*”layer” + 0.017*”surfac” + 0.016*”graphen” + 0.015*”superconduct” + 0.012*”structur” + 0.010*”film” + 0.010*”metal” + 0.009*”materi” + 0.009*”ar” + 0.009*”electron”’),
(36, ‘0.027*”game” + 0.021*”strategi” + 0.018*”thi” + 0.014*”ar” + 0.014*”agent” + 0.012*”question” + 0.012*”individu” + 0.009*”mechan” + 0.008*”inform” + 0.008*”player”’),
(37, ‘0.110*”equat” + 0.103*”solut” + 0.026*”nonlinear” + 0.016*”ar” + 0.014*”global” + 0.013*”problem” + 0.013*”exist” + 0.010*”obtain” + 0.010*”gener” + 0.010*”case”’),
(38, ‘0.126*”distribut” + 0.046*”statist” + 0.043*”invari” + 0.033*”relat” + 0.032*”law” + 0.022*”ensembl” + 0.020*”thermodynam” + 0.020*”gener” + 0.014*”deriv” + 0.011*”thi”’),
(39, ‘0.163*”expans” + 0.047*”inclus” + 0.028*”cumul” + 0.026*”calcul” + 0.026*”green_function” + 0.014*”flag” + 0.014*”approxim” + 0.011*”asymptot_expans” + 0.011*”carlo” + 0.011*”hydrodynam”’),
(40, ‘0.133*”group” + 0.035*”represent” + 0.026*”finit” + 0.015*”subgroup” + 0.015*”gener” + 0.013*”24g24" + 0.012*”noncommut” + 0.012*”free” + 0.012*”action” + 0.012*”ar”’),
(41, ‘0.033*”mass” + 0.028*”model” + 0.022*”higg” + 0.014*”coupl” + 0.013*”standard_model” + 0.012*”decai” + 0.012*”new” + 0.011*”gev” + 0.011*”higg_boson” + 0.011*”ar”’),
(42, ‘0.043*”halo” + 0.028*”profil” + 0.021*”satellit” + 0.018*”simul” + 0.016*”stream” + 0.015*”tidal” + 0.012*”escap” + 0.012*”galact” + 0.012*”milki_wai” + 0.011*”veloc”’),
(43, ‘0.084*”black_hole” + 0.053*”binari” + 0.030*”hole” + 0.027*”mass” + 0.024*”neutron_star” + 0.023*”omega” + 0.017*”extrem” + 0.017*”bubbl” + 0.015*”horizon” + 0.014*”collaps”’),
(44, ‘0.145*”alpha” + 0.103*”beta” + 0.096*”inhomogen” + 0.042*”fractal” + 0.032*”selfsimilar” + 0.027*”245cmu24" + 0.011*”digraph” + 0.011*”quantum_electrodynam” + 0.011*”wedg” + 0.011*”4manifold”’),
(45, ‘0.035*”decai” + 0.027*”measur” + 0.023*”product” + 0.023*”ar” + 0.016*”data” + 0.013*”event” + 0.012*”us” + 0.011*”result” + 0.010*”gamma” + 0.010*”search”’),
(46, ‘0.169*”beam” + 0.055*”momentum” + 0.032*”colour” + 0.024*”green” + 0.021*”reflect” + 0.019*”seed” + 0.017*”sea” + 0.012*”inject” + 0.012*”incid” + 0.011*”orbit_angular”’),
(47, ‘0.106*”gap” + 0.096*”frame” + 0.072*”perfect” + 0.055*”diagon” + 0.046*”mirror” + 0.035*”skew” + 0.021*”rectangular” + 0.019*”impuls” + 0.016*”dft” + 0.012*”swarm”’),
(48, ‘0.159*”polar” + 0.087*”photon” + 0.050*”transvers” + 0.040*”asymmetri” + 0.026*”longitudin” + 0.021*”virtual” + 0.020*”diffract” + 0.020*”helic” + 0.012*”clock” + 0.010*”twophoton”’),
(49, ‘0.090*”torsion” + 0.035*”fano” + 0.033*”glass” + 0.029*”firstli” + 0.025*”secondli” + 0.024*”calabiyau” + 0.023*”ln” + 0.021*”slit” + 0.019*”gl” + 0.017*”threefold”’),
(50, ‘0.030*”physic” + 0.024*”theori” + 0.020*”discuss” + 0.020*”thi” + 0.018*”review” + 0.016*”ar” + 0.013*”model” + 0.012*”string” + 0.010*”understand” + 0.009*”new”’),
(51, ‘0.078*”univers” + 0.071*”model” + 0.044*”cosmolog” + 0.024*”paramet” + 0.018*”matter” + 0.016*”dark_energi” + 0.016*”observ” + 0.015*”evolut” + 0.011*”thi” + 0.009*”shell”’),
(52, ‘0.025*”matrix” + 0.022*”number” + 0.021*”24n24" + 0.020*”matric” + 0.015*”let” + 0.014*”ar” + 0.011*”sequenc” + 0.011*”24p24" + 0.011*”show_that” + 0.011*”24k24"’),
(53, ‘0.035*”entropi” + 0.033*”hamiltonian” + 0.019*”quantiz” + 0.016*”ar” + 0.016*”gener” + 0.015*”formul” + 0.015*”thi” + 0.013*”term” + 0.013*”transform” + 0.013*”theori”’),
(54, ‘0.062*”algebra” + 0.017*”gener” + 0.017*”ring” + 0.016*”categori” + 0.015*”modul” + 0.014*”thi” + 0.013*”ar” + 0.011*”construct” + 0.011*”ideal” + 0.011*”varieti”’),
(55, ‘0.070*”process” + 0.033*”time” + 0.029*”stochast” + 0.029*”diffus” + 0.019*”discret” + 0.019*”converg” + 0.017*”moment” + 0.013*”distribut” + 0.013*”ar” + 0.012*”limit”’),
(56, ‘0.080*”spin” + 0.071*”magnet” + 0.018*”magnet_field” + 0.016*”order” + 0.014*”ferromagnet” + 0.013*”field” + 0.011*”effect” + 0.010*”anisotropi” + 0.010*”antiferromagnet” + 0.008*”superconduct”’),
(57, ‘0.052*”activ” + 0.050*”respons” + 0.033*”concentr” + 0.028*”model” + 0.021*”reaction” + 0.018*”mechan” + 0.018*”molecular” + 0.018*”chemic” + 0.017*”biolog” + 0.014*”membran”’),
(58, ‘0.069*”algorithm” + 0.045*”comput” + 0.028*”code” + 0.024*”us” + 0.023*”effici” + 0.017*”implement” + 0.014*”perform” + 0.012*”simul” + 0.012*”present” + 0.012*”error”’),
(59, ‘0.061*”model” + 0.036*”estim” + 0.034*”data” + 0.022*”us” + 0.016*”sampl” + 0.016*”ar” + 0.015*”paramet” + 0.014*”method” + 0.011*”analysi” + 0.010*”propos”’),
(60, ‘0.028*”emiss” + 0.025*”xrai” + 0.021*”observ” + 0.017*”line” + 0.014*”ar” + 0.014*”sourc” + 0.013*”agn” + 0.011*”region” + 0.009*”detect” + 0.008*”thi”’),
(61, ‘0.243*”space” + 0.063*”topolog” + 0.043*”lattic” + 0.024*”compact” + 0.018*”properti” + 0.018*”set” + 0.016*”24x24" + 0.015*”continu” + 0.014*”subspac” + 0.014*”measur”’),
(62, ‘0.052*”star” + 0.015*”ar” + 0.013*”planet” + 0.012*”stellar” + 0.011*”mass” + 0.011*”observ” + 0.009*”thi” + 0.009*”orbit” + 0.008*”abund” + 0.007*”model”’),
(63, ‘0.083*”optim” + 0.042*”problem” + 0.020*”algorithm” + 0.018*”cost” + 0.015*”control” + 0.013*”minim” + 0.012*”decis” + 0.011*”polici” + 0.011*”constraint” + 0.010*”maxim”’),
(64, ‘0.059*”24q24" + 0.048*”exciton” + 0.036*”circular” + 0.026*”insert” + 0.026*”merg” + 0.026*”fibr” + 0.022*”exit” + 0.020*”rod” + 0.019*”binomi” + 0.019*”kolmogorov”’),
(65, ‘0.061*”interpol” + 0.038*”cme” + 0.037*”nova” + 0.030*”eject” + 0.027*”mc” + 0.025*”bias” + 0.020*”larg_deviat” + 0.016*”erupt” + 0.016*”hadamard” + 0.014*”hash”’),
(66, ‘0.153*”cell” + 0.043*”visual” + 0.037*”recurr” + 0.026*”cellular” + 0.023*”self” + 0.018*”pl” + 0.015*”tower” + 0.014*”discrimin” + 0.014*”tissu” + 0.014*”nodal”’),
(67, ‘0.028*”theorem” + 0.020*”proof” + 0.018*”thi” + 0.017*”us” + 0.017*”theori” + 0.016*”gener” + 0.012*”formal” + 0.012*”logic” + 0.011*”notion” + 0.010*”present”’),
(68, ‘0.060*”ball” + 0.040*”prime” + 0.029*”van_der” + 0.025*”tip” + 0.022*”crack” + 0.022*”waal” + 0.022*”retard” + 0.021*”middl” + 0.020*”outlier” + 0.018*”stein”’),
(69, ‘0.064*”energi” + 0.024*”ar” + 0.024*”scatter” + 0.021*”nuclear” + 0.019*”calcul” + 0.016*”interact” + 0.014*”reson” + 0.014*”nuclei” + 0.013*”us” + 0.012*”neutron”’),
(70, ‘0.206*”famili” + 0.072*”causal” + 0.063*”ergod” + 0.041*”tilt” + 0.040*”member” + 0.022*”ab” + 0.017*”bernoulli” + 0.016*”ribbon” + 0.015*”cauchi” + 0.015*”orthogon_polynomi”’),
(71, ‘0.038*”mode” + 0.033*”optic” + 0.028*”atom” + 0.027*”reson” + 0.017*”light” + 0.015*”frequenc” + 0.015*”caviti” + 0.013*”trap” + 0.012*”laser” + 0.012*”oscil”’),
(72, ‘0.164*”system” + 0.104*”dynam” + 0.024*”control” + 0.015*”model” + 0.014*”time” + 0.012*”ar” + 0.011*”coupl” + 0.010*”studi” + 0.008*”behavior” + 0.007*”two”’),
(73, ‘0.027*”forc” + 0.021*”simul” + 0.015*”ar” + 0.014*”model” + 0.013*”liquid” + 0.012*”surfac” + 0.011*”structur” + 0.010*”interfac” + 0.010*”size” + 0.009*”solid”’),
(74, ‘0.088*”phi” + 0.047*”dna” + 0.045*”textur” + 0.035*”top” + 0.032*”evapor” + 0.032*”sort” + 0.025*”ladder” + 0.023*”partner” + 0.014*”precipit” + 0.011*”prolong”’),
(75, ‘0.073*”line” + 0.069*”doubl” + 0.067*”configur” + 0.042*”angl” + 0.040*”theta” + 0.030*”sign” + 0.028*”stack” + 0.025*”bundl” + 0.024*”moduli_space” + 0.021*”face”’),
(76, ‘0.045*”measur” + 0.026*”signal” + 0.019*”detect” + 0.019*”detector” + 0.019*”us” + 0.016*”sensit” + 0.016*”design” + 0.015*”experi” + 0.014*”nois” + 0.012*”perform”’),
(77, ‘0.038*”ion” + 0.030*”structur” + 0.028*”pressur” + 0.021*”24_224" + 0.018*”calcul” + 0.016*”electron” + 0.016*”ar” + 0.014*”compound” + 0.014*”245calpha24" + 0.013*”bond”’),
(78, ‘0.143*”quantum” + 0.031*”entangl” + 0.023*”state” + 0.022*”measur” + 0.018*”classic” + 0.015*”qubit” + 0.012*”thi” + 0.011*”ar” + 0.010*”us” + 0.010*”system”’),
(79, ‘0.138*”variat” + 0.043*”price” + 0.037*”market” + 0.030*”risk” + 0.029*”model” + 0.024*”return” + 0.020*”option” + 0.018*”barrier” + 0.016*”stop” + 0.016*”volatil”’),
(80, ‘0.214*”threshold” + 0.059*”percol” + 0.042*”satur” + 0.040*”ir” + 0.026*”polaris” + 0.021*”minima” + 0.019*”multifract” + 0.019*”maxima” + 0.018*”mf” + 0.016*”lda”’),
(81, ‘0.065*”conserv” + 0.063*”cone” + 0.042*”principl” + 0.031*”rough” + 0.024*”conserv_law” + 0.024*”implicit” + 0.022*”cylindr” + 0.020*”rai” + 0.018*”maximum” + 0.012*”cap”’),
(82, ‘0.122*”test” + 0.026*”text” + 0.019*”24b” + 0.018*”sensor” + 0.017*”us” + 0.016*”wa” + 0.015*”student” + 0.013*”document” + 0.012*”present” + 0.011*”facil”’),
(83, ‘0.077*”dark_matter” + 0.025*”constraint” + 0.020*”annihil” + 0.020*”dark” + 0.017*”direct” + 0.017*”dm” + 0.016*”cmb” + 0.015*”background” + 0.015*”detect” + 0.015*”power_spectrum”’),
(84, ‘0.064*”24d” + 0.039*”fm” + 0.037*”landscap” + 0.028*”disloc” + 0.028*”selforgan” + 0.020*”soc” + 0.017*”artifact” + 0.016*”bilay_graphen” + 0.016*”forbidden” + 0.012*”motif”’),
(85, ‘0.042*”ga” + 0.040*”disk” + 0.017*”disc” + 0.016*”format” + 0.016*”core” + 0.015*”model” + 0.013*”accret” + 0.012*”inner” + 0.012*”mass” + 0.010*”radial”’),
(86, ‘0.107*”3d” + 0.091*”2d” + 0.075*”revers” + 0.071*”potenti” + 0.053*”adiabat” + 0.023*”1d” + 0.019*”casimir” + 0.015*”dirac_equat” + 0.013*”cellular_automata” + 0.012*”flip”’),
(87, ‘0.060*”theori” + 0.033*”action” + 0.023*”limit” + 0.016*”amplitud” + 0.014*”thi” + 0.013*”us” + 0.012*”comput” + 0.011*”gaug_theori” + 0.010*”partit_function” + 0.010*”correl_function”’),
(88, ‘0.044*”electron” + 0.023*”current” + 0.022*”conduct” + 0.020*”transport” + 0.020*”charg” + 0.017*”effect” + 0.015*”devic” + 0.011*”tunnel” + 0.010*”electr” + 0.010*”quantum_dot”’),
(89, ‘0.104*”pulsar” + 0.043*”24r” + 0.039*”video” + 0.038*”recombin” + 0.025*”psr” + 0.021*”shadow” + 0.020*”l_evi” + 0.018*”aa” + 0.016*”spot” + 0.016*”seismic”’),
(90, ‘0.041*”secur” + 0.037*”protocol” + 0.034*”kei” + 0.026*”phy” + 0.025*”et_al” + 0.021*”cloud” + 0.020*”attack” + 0.019*”rev” + 0.014*”comment” + 0.013*”commun”’),
(91, ‘0.080*”lambda” + 0.047*”content” + 0.045*”tau” + 0.039*”mu” + 0.027*”social” + 0.025*”social_network” + 0.025*”user” + 0.022*”influenc” + 0.020*”topic” + 0.018*”onlin”’),
(92, ‘0.157*”random” + 0.070*”formula” + 0.034*”diagram” + 0.026*”probabl” + 0.022*”gaussian” + 0.020*”asymptot” + 0.018*”symbol” + 0.012*”24l24" + 0.012*”measur” + 0.011*”gener”’),
(93, ‘0.116*”cover” + 0.073*”interv” + 0.065*”length” + 0.053*”word” + 0.049*”translat” + 0.048*”languag” + 0.023*”polygon” + 0.018*”automata” + 0.015*”filtrat” + 0.011*”letter”’),
(94, ‘0.047*”oper” + 0.046*”bound” + 0.023*”domain” + 0.020*”inequ” + 0.018*”result” + 0.017*”condit” + 0.016*”spectral” + 0.016*”ar” + 0.015*”estim” + 0.014*”prove”’),
(95, ‘0.143*”correl” + 0.116*”flow” + 0.052*”fluctuat” + 0.052*”jet” + 0.025*”collis” + 0.014*”epsilon” + 0.012*”correl_between” + 0.011*”initi” + 0.010*”depend” + 0.009*”24v”’),
(96, ‘0.054*”phase” + 0.020*”transit” + 0.018*”model” + 0.017*”interact” + 0.016*”phase_transit” + 0.012*”disord” + 0.012*”system” + 0.012*”studi” + 0.010*”order” + 0.010*”critic”’),
(97, ‘0.164*”rule” + 0.056*”truncat” + 0.051*”rho” + 0.024*”xi” + 0.022*”ontolog” + 0.021*”team” + 0.016*”revis” + 0.016*”elect” + 0.015*”mismatch” + 0.014*”sigma”’),
(98, ‘0.035*”sourc” + 0.025*”spectrum” + 0.022*”energi” + 0.019*”observ” + 0.019*”flux” + 0.018*”gammarai” + 0.013*”peak” + 0.013*”emiss” + 0.011*”grb” + 0.011*”cosmic_rai”’),
(99, ‘0.117*”map” + 0.106*”integr” + 0.087*”polynomi” + 0.037*”deform” + 0.023*”transform” + 0.022*”ar” + 0.021*”gener” + 0.012*”ident” + 0.012*”dual” + 0.009*”cyclic”’)]