Skip to main content

Transforming the understanding
and treatment of mental illnesses.

The NIMH Director’s Innovation Speaker Series: Developmental Cognitive Neuroscience in the Era of Big Data

Transcript

JOSHUA GORDON: Good afternoon and welcome to the National Institute of Mental Health Director’s Innovations Speaker Serries. I am Joshua Gordon, director of the NIMH. It is really my pleasure to welcome you here and to welcome out speaker for the day, Damien Fair, whom I will introduce in just a moment. First some housekeeping.

(Housekeeping announcements)

JOSHUA GORDON: It is my pleasure to introduce Dr. Fair. Damien Fair is a behavioral neuroscientist and really a pioneer in the emerging field of using big data to answer important questions for psychiatry and for neuroscience. He is a professor at the University of Minnesota and director of the Masonic Institute for the Developing Brain. He actually hails from Minnesota, and then went to South Dakota for college before becoming a physician’s assistant at Yale University and working in their Neurology Department.

In 2003 he decided he wanted to create the science around the issues that he was studying, and he went to Washington University to get a Ph.D. in neuroscience. There he became interested in looking at ways to apply a functional magnetic resonance dimension on an individual level, and that has really been the theme of his career ever since, as I am sure he will tell you more about today.

As a graduate student he developed and pioneered methods to mine fMRI scans for resting state information and was really among the earliest efforts to establish the use of that technique for measuring and quantifying connections in brain regions. After completing his Ph.D. requirements, he went to Oregon Heath and Sciences University where in order to expand his expertise in behavior, he worked with clinical psychologist Joel Nigg and psychologist Bonnie Nagel to understand adolescent brain development and behavior, before joining the faculty there in 2014. Then of course, moving on to Minnesota just this past year.

I am really excited because I have had many discussions with Damien about big data, about imaging, about how to merge those two fields, and I am really pleased to be able to welcome him here today. I will point out that he is one of the important investigators involved in our ABCD and HBCD initiatives, and I am sure he is going to be talking somewhat about that today as well. Damien, thanks.

DAMIEN FAIR: Thank you and thanks for having me. Thanks for the introduction. I am happy to be here. I will be talking today a little bit about some new work on the developments of neuroscience in the era of big data. I always like to start off with this slide highlighting a lot of the different types of collaborators that go into the work that we do in my lab. Some of them are really close members of my lab and it just highlights all of the different types of expertise that is required to conduct this work.

There are a few goals today. One of them is to give a brief history of cognitive neuroscience and the state of the field. I will describe the collaboration using this massive national resource to highlight replicability problems in our field, what we are going to do about it, leveraging progress in prior fields, highlighting an exemplar to tackle heterogeneity problems using brain data. And then at the end, if there is time, we will go through quickly some of what I would call slot. It is usually the business of looking at strategic planning, but some strengths, weaknesses, opportunities of our field that I think is important for us to think about – some food for thought at the end.

It might be hard to believe but the term cognitive neuroscience borne out of a late-night city taxi ride, is now 50 years old. The term was meant to describe that we were trying to understand how specific characteristics of the physical brain support various aspects of the mind. It resulted in the scientific discipline that merged basic neuroscience with psychology. Over the last five decades the phrase has changed quite a bit in ways that its founders may have never imagined.

So the merging of the previous parallel fields in psychology and neuroscience really benefitted from the emergence of both PET imagining and fMRI. So these technologies, which capitalize on coupling and metabolism in the brain to identify brain activity really made the growth of this field quite possible. Some of the limitations of this work – one of them may seem quite mundane today, and that was required to generate consistent signals, and that related to the inter-individual heterogeneity in structural and functional anatomy in the low signal-to-noise of the actual images of PET.

And it made it very difficult to draw general conclusions about brain function. Out of Marc Raichle lab, psychologist Eric Reiman(ph.) proposed something very simple, that we average the PET data across people using standardized atlases (audio issue) with that it mean to increase the SNR in this way. The result of this approach as Mike Gazzaniga says was unambiguous. The landmark paper that followed by Fox presented the first integrated approach of the design, execution and interpretation of functional brain images, really is carried on to this day.

This is actually a slide when I started graduate school. It one of my first experiments where I was trying to examine the intricate dissection of psychological processes in patients with stroke using fMRI. Until that time, the field had not caught up with some of the early groundbreaking work of Bharat Biswal, who showed that simple task activity, activations in the brain while people are doing actual tasks shows similar functional anatomy of correlated spontaneous activity with folks who are at rest, not doing anything at all.

These results from Bharat laid quite dormant for many years, but that changed drastically after this paper by my colleague and classmate down the hall, Mike Fox, also from Marc Raichle’s lab from 2005, which showed that specific systems in the brain in the default network are negatively correlated with other systems of the brain that are important for higher cognition, like the single circular system in front of parietal networks. I necessarily was not the first to discover all these findings that were in this paper, but certainly I think I was the first to note the collective attention of the community.

There are many fundamental properties of brain organization, topology and topography that followed, that came out of this work. But it also shifted some of the questions that we are asking because of the ease with which we acquired data when people are just sitting at rest, not doing anything at all.

So instead of collecting data, repeating measures across time in a task experiment, now we are taking behavioral measures outside the scanner and relating them to trait characteristics across the sample from data collected in the scanner.

Since then the race has been on try to understand how various trait characteristics, like functional activity, map out of these complex behaviors and psychopathologies. A potential issue is that study designs, after this kind of shift in what we were measuring, didn’t actually change. So the same sample sizes and the same study designs that we are using for fMRI were then immediately transferred to studies of functional connectivity.

Along the way, as the resting state functional activity grew and network neuroscience expanded, our understanding of brain organization and development – the data sets were increasing in sample size in the amount of data collected per subject. Likely kicked off by the Human Connectome Project but also the 1000 Functional Connectomes, NKI-Rockland Sample, the Adolescent Brain and Cognitive Development, 10,000 participants in the UK Biobank enigma. Many these studies popped out, the sample size became bigger and bigger.

But also there was another area where more data was being collected within subjects, which was really kicked off with the My Connectome Project with Russ Poldrack, followed by the Midnight Scan Club data, but others as well, where instead of collecting lots of data across many people, they has lots and lot of within subject data within a given person.

Indeed, many, many datasets have come along over the last decade, but you can have lots of subjects or lots of data within subjects, even though the ideal may be somewhere out here where we have samples of both.

The other thing that has happened around the same time, probably five or six or seven years is that we started to see signs of failures in reproducibility. They have become a little bit more apparent. There have been several papers that have come out in the last decade that have been showing the blurring lights that a lot of our strong findings may no be as reproducible as we had thought they might be.

So the conclusion here is that the field continues to evolve as the data are collected in these broader populations at faster rates than ever before. The network of neuroscience in the brain consisting of multiple interconnected systems, support complex behavior has also advanced in the field in new directions. However, the arrival of these large-N data sets and repeated sampling data sets may be highlighting the decline of the neuroscience framework carried forward from the origin of our field to modern investigations is limiting our progress. That brings me to point two.

As the size of the scale of the data has grown, so has the necessity to integrate efforts and embrace collaboration and sharing in each other’s successes. I mean, we just can’t really do the work all alone like we used to. I think that the studies that are now popping out from the ABCD study, which is a collection of 12,000 nine- and 10-year-olds, data being collected on them yearly for the next 10 years, has really kind of highlighted this point clearly.

At the end of the 12-year effort, there will be 25 million non-biological measurements collected, five MRI scans totaling over 700,000 different individual scans. The ABCD study must acquire, of course, PHI security of the NDA. It releases raw data in real time in the form of fast-track data releases. The DAIRC provides yearly updated condensed tabulated data releases. Newly processed data sets, which have applied basic processing to the data. There is a data exploration analysis portal, or DEAP, that allows for relatively straightforward statistics on the data.

But for community-derived measures the only mechanism to share subject specific data is through the NDA. So Eric Feczko, newly minted assistant professor at the University of Minnesota, has been working on building this ABCD community collection, where there are lots of different types of derived data processing and analytics tools, data from common pipelines, like fMRI prep, QSIprep prep, ABCD-BIDS, replication/validation tool like Matched Sample, that we call ARMS, that are available, for utilization but also with the work with the team available to help share some of the data that may come from the community.

It is there to supplement current data sharing on a successful platform, assisting investigators with conducting the science and provide us these integrated data and utilities for verification of the results, to help us deal with some of the issues I will be discussing shortly.

From that data set arose this collaboration, which was just a tour de force by two really amazing junior investigators, Scott Marek and Brendon Tervo-Clemmens. One of my close colleagues and myself, part of the declining old guard, put together this really amazing paper on reproducible brain-light association studies. Again, it is a massive collaboration across lots of different labs and people.

The basic questions of the study are what are the effect sizes of brain behavior correlations? What should we expect with regard to our effect sizes? Then does the reliance on the sample sizes provide an explanation for why we might see failures when we are doing brain-wide association studies and why? Of this larger ABCD sample, it was narrowed down to the highest, cleanest participants – it ended up being about 4,000 participants. It would allow a more precise estimate of the effect size of brain behavior and through the sampling techniques examine the effect of reproducibility? The work was de in al different types of ways, on looking at cortical thickness, on ROIs, on specific networks using edges, connectivity edges against specific ROIs, on all sorts of demographics – up to 41 different demographics, I am sorry, behavioral measures on cognition and mental health, to see what the general expectations might be.

So what are these effect sized? We are showing here a little graph that has effect sizes here on this axis, and on the bottom here you are looking at the neuroanatomy associated with that. And here is just the distribution of the measure of the psychopathology, of the effect sizes you would expect the relationships between the behaviors (audio issue). It is kind of tiny here but the distribution goes from 0.1 to 1.0.

If you look at the cognitive ability you see general similar results. Again, this is all the functional connectivity. But the same result persists with cortical thickness as well. The largest effect size in this entire sample was [r] of .16 absolute value. And you can see the top one percent in the median as well. The gist is that the effect sizes is these brain and behavioral relationships are small.

So the real question is does the reliance provide an explanation for replication failures. To try to understand this, what Scott has put together is a really nice example of describing the issue of sampling variability, which is simply how much an effect size varies between samples? Scott says it is objectively boring, rarely considered, but super important.

So here is the example that usually kind of hits home. If I wanted to measure the relationship between height and age in the ABCD study, I might pull out a sample of 25 participants and you might see a relationship between r of 0.85. That wouldn’t be too unexpected. That is probably what we would expect between these ages in months.

But I might pull out another sample, and their relationship might actual be zero, which would be not expected. I might to that over and over again and generate a distribution – this is the actual data from ABCD –that you would get a distribution that goes between just slightly negative to almost one, for samples of about 25. That means that any given study could potentially suggest that the relationship between height and age is nothing. And if you look across different sampling of different sample sizes you see that you need hundreds of participants, even up to a thousand, just to get an accurate representation of the true effect size, which is just above 0.5 between height and age.

So increasing sample size is the only way to really decrease sampling variability, because it is a random source of noise. And it is very different from other systematic sources of noise, which I have talked about many times in other discussions related to things like head motion. But also other noise in our literature, like p-hacking, publication bias, and things like that.

What do these small effect sizes and sampling variabilities mean with regard to small studies? Here are effect sizes, a thousand bootstraps in the measure of cognitive ability. And this is just taking one sample of our 25. You can imagine this as being one study or one lab. You might get a really strong correlation that passes correction, that shows a positive relationship between your measure and your connectivity. In this case the default mode cortical thickness.

You might make another sample in another lab and if the sample is small enough, you might get an equally significant result that actual goes in the opposite direction. In fact, if you have been out and reading the literature on some findings that you think should be in accordance, but they actually show completely different findings, we think this is likely because of issues related to small sample sizes in the context of larger effects.

The same is true, of course, for all the other characteristics here across cortical thickness and connectivity. So the observed effect sizes suggest that consortia-level data are required to reproducibly detect univariate phenotype associations, up to 2,000 participants. Although of course it always depends on its true effect size. Now the question is, is this phenomenon specific to the multi-site study, like the adolescent in ABCD study?

I am going to start by just saying in response to some of the reviewers for this, there was a request to see all the stuff replicated in more data sets, including the Human Connectome, but also the UK Biobank, which is a massive amount of data to be able to get back in a typical revision time. Something that in working together with our new institute’s informatics group, which contains many data scientists, many of them are not neuroimagers, but also collaborations with our neuroinformaticist up at McGill University, and developed some systems particularly done by, you know, very important cog in this whole group, Tim Hendrickson, to process and analyze and do this amazing piece of work starting from scratch with 30,000 participants in just several weeks. I think that highlights just how important it is to have strong collaborations and to work across multiple fields to really do this work right.

The gist of the results is this. One, what we are looking at here is the distribution of effect sizes on the bottom, and this over here is the count. The Human Connectome data, the HPC data, the UK Biobank data are here colored. This is a subset equaling the data set size of 900 participants. As you can see looking at the HPC data, but also looking at the UK Biobank data, that the results here are independent of just being a kid sample, and also independent of the data being collected at more than one site. The results are nearly identical.

What is a little bit scary is if you look in these exact same data, and now this is the same samples, but now it is the 900 participants for HCP, 4,000 from ABCD, and now up to 19,000 from the UK Biobank, you see that these distributions are beginning to narrow. So that means likely that even these 4,000 participants for ABCD are not likely hitting the asymptote necessary to the sample size needed to maximize reproducibility, suggesting again that the effect sizes are indeed really small.

Does the reliance on these typical neuroimaging size provide explanation for replication failures in BWAS and why? The answer is yes. It is like the coupling of these small effect sizes with sampling variability produces the statistical errors that undermines our ability to make sure our data are reproducible and generalizable. And the people are interested in digging a lot deeper in these concepts, particularly related to ABCD. There is a really nice paper by Wes Thompson and Anthony Dick who go over a lot of these details in great length.

All right. So now the question is what are we going to do about it? Here I think we can learn some lessons by leveraging some progress in other fields who maybe had similar issues. I pulled this critical review of candidate gene studies. This is from 10 years ago. I did it not necessarily because this is the best review of the issues with candidate gene studies, or the history and the progress of GWAS studies. Rather I pulled it because if cognitive neuroscience looked in the mirror today, they would be looking and seeing figures just like this. Where most studies are too small, you can see in the bottom graph on the left, to measure true effects, and the top, which highlights that what you need to measure these true effects within a given gene, in the candidate genes, needs to be relatively large.

Indeed, and this is a quote directly from this review that I really enjoyed reading, it says in this new era of big data and small effects, a recalibration of views about a finding is necessary. I think that we are somewhat in the same boat. It is without question that the GWAS studies in the last decade have seen successes for sure. While there are certainly issues that we hope to avoid, and this one is that when you start generating these large data sets and sometimes have biased populations inside, sometimes by necessity, that they may not be generalizable to the entire population. This is a nice really good illustration of how the predictive value of polygenic risk scores from these GWAS studies are highly predictive of folks from European descent, but maybe less of other general populations. And these are some things you may want to avoid in the future.

Nonetheless, I would say that the successes of GWAS over the last decade are well documented. There are similar key factors that have contributed to the success of reproducibility is something we should probably take heed to. Here is just a list from the Committee on Reproducibility and Replicability in Science from the National Academy of Sciences that I think is, again, something good for us to consider.

So consistent in data generation and other factors is one, consistency in data generation and the extensive quality control steps to ensure reliability of the data; genotype and the phenotype harmonization; a push for large sample sizes to an establishment of large international disease consortia; rigorous study designs and standardized statistical analysis protocols, including consensus building on control for key confounders, use of stringent criteria to account for multiple testing, development of norms and standards for conducting replication studies, and meta analyzing multiple cohorts; a culture of large-scale international collaboration, and sharing of data, results and tools. Empowered by a strong infrastructure support; and an incentive system, which is created to meet scientific needs and is recognized and promoted by funding agencies, journals and paper reviewers for scientists to perform reproducible, replicable and accurate research.

We will touch base a little bit more on this when I get down to our SWOT analysis.

One of the things we have been playing with over recent years, and this is stuff – I should say recent year – these are things that are relatively new for a lab, but we have been trying to borrow from the genetics literature in the generation of polygenic risk scores to potentially apply it towards brain data. Polygenic risk analysis, there are essentially two independent data sets, a base data set and a target data set, that have basic types of summary values and statistics. There is rigorous QC that goes on. There are specific types of calculations to generate the polygenic risk scores. There are tests and test data, and there is further validation or cross validation on the outcome.

A few folks in the lab, another newly minted assistant professor, Oscar Miranda-Dominguez, two trainees, Nora Byington in the lab and Gracie Grimsrud, a freshman at the University of Minnesota, who is just a phenomenal talent, have been working with Oscar to start developing our ability to do similar types of improvement on our effects by combining it across multiple small effects. This isn’t a new idea. We actually stole this idea in some respects, from a really nice set of publications by our collaborators, Hunter Dale and his group at UCSD, who are doing work particularly with cortical thickness on polyvertex scores.

The idea is we can take our max samples that we can have from the ABCC data set, and another third data set – this comes from ADHD 1000, from our work with Joel Nigg at Oregon Health and Science University, to try to replicate this environment. We can examine this in multiple different parcellations of the brain, various types of statistics and adjustments for correlated activity. And again, doing various types of validation outside.

I am going to show you some examples using these PC, these behavioral principal components that were obtained by Wes Thompson in some work with the work group at ABCD and Monica Luciano. The idea is you take all these measures from NIH tool box in ABCD, as well as a few other tasks, and you run this Bayesian principal component analysis on it. What they found and what we replicated in the next set of data, is measures on general ability, executive functions and working memory.

What we are going to do, and this first example is going to be primarily an executive function, but we are going to split our ABCD data set into these two arms. These are the arms I was talking about earlier. We can build a model of calculated beta weights and we can test the model with these polyneural risk scores. We have done this in many ways, various types of covariates and correction for covariates, validations again on the third data set across many types of motion thresholds, across different types of parcellations, et cetera. I am going to show you samples of this today.

Here are the individuals ranked as individual brain features. This is the explained variance, and these are the beta weights over here. Now what you can see, right away, the explained variance is extremely small per weight. That is not surprising considering what I just showed you based on the work that we did with Marek et al. But now what we are going to do is generate those polyneural risk scores that combine the weights across all of these separate, unique brain features or connections, and we do it across multiple different thresholds of features. And what you can see is you use the top features to all the features that there is an increase in the amount of variance explained for this specific behavior that peaks at around 25 percent variance, at least with the Gordon data set, or parcellation. That is what we are seeing down here.

This is the relationship in the outer sample data of the polygenic risk score with 25 percent of the sample, and you can reach up to almost three percent of the variance described in this way. Again, much, much higher than you could get from any one connection alone.

Here are what the beta weights look like at the 25 percent marker. The outlines here of the different regions are the specific networks, probably less important, but the more important part here is that what appears to be happening in the maximized relationship of executive function that explains the most variance is that it is not isolated to any given connection or even specific networks, per se. There are really affects across the entire brain. We think that is really important for how we might go forward in thinking through how we conceptualize and model some of these higher order brain features to these complex behaviors.

There are some variability, of course, in this train. We are less confident on the reliability of the specific individual brain features for the exact reasons I have been talking about before, but we are working on that to see what we should expect with regard to reliability from feature to feature. There is definitely some reliability, but how strong the specific specifics are we are not perfectly clear yet. So we are putting less attention on the specifics there, at least for now.

Another question is can we borrow some of these standard statistical analyses from genetics to increase the effect sizes of brain associations? I think we can. Then is says how might we use these tools to overcome issues of sample size and potentially provide some clinical utility? In this one we are going to be doing a lot more when we consider ADCD symptom scores and how they may relate to behaviors like executive function, general ability to learn and others.

So here is our exemplar. Tackling the heterogeneity problem using brain data like polyneural risks scores. One goal when we are examining these complex behaviors are brain physiology in youth to determine whether the information associated with development trajectories or mental health issue now or later in life. Can the information from these tools at a given stage assist in predicting outcomes? Can it help us tailor early interventions?

Typically in the past we have approached this issue by doing things on the group level. We take some group statistic or group characterization of phenotype and another group phenotype, and we compare them, like ADHD control, a kid or an adult. There are some problems here. One is that the model relies on the assumption that our diagnostic categories are homogeneous. It may be that there are different types of mechanisms that lead to the behaviors of ADHD. It also may presume that the control population represents one big homogeneous group. It might be that there are different profiles that exist in the control population as well.

The idea of heterogeneity in ADHD is not new. There have been lots and lots of theoretical papers that have described how this must be true. But while it is easy to propose conceptually that it is true, demonstrating it is not that straightforward. Here's why:

If you go back to your lab and you sample three participants – it is not too hard to subsample and identify different subgroups of those three people because there is only two different ways that you can do it. But as soon as you have 10 people in your study, there are 21,000 different ways, and as soon as you have 20 people in your study there are over a trillion different ways. This is just not an easy problem to do.

There are lots of different ways that folks have been thinking about how to handle this, hierarchical clustering, K-means clustering, latent class analysis, finite mixture models. But we have been playing with the idea of using graph theory and community detection to help answer this question. So graft theory (audio issue). So what is a network? Networks are simply collections of nodes that are joined by some line or edge. In our field we have been using the idea of modules or community detection to identify clusters that are connected to other clusters or nodes. And we have various types of community detection algorithms to maximize the intercommunity edges versus the intercommunity edges.

Typically, our nodes are brain regions and connections of functional connectivity and that has led to very early – it seems like decades ago and now it is – results that highlight how the brain might be parsed into different communities or network structures.

Now instead of our nodes being brain regions that are connections being edges, now the nodes are actually the people or participants, and our edge is person-centered measurements that can be rated to other people. The idea is can you use the same type of technique to identify different subtypes of subpopulations.

In 2012 we worked on this, and we started with this samples of subjects you can see here, about 500 participants. We had about 20 different psychological measurements and we broke it down using confirmatory factor analysis in these seven different categories. Then we asked the question do we see different subgroups in the ADHD population? The reality is that we did. Even though clinically it looked the same, symptom-wide it looked the same, but behavior-wise relative to their peers it looked quite different. We had the samples with atypical response variability, samples with different atypical executive function, samples with different atypical arousal, et cetera. So it looked like we were onto something here.

What changes when we looked at the control populations? When we looked at the control population, the same types of patterns that we saw in ADHD we saw in the control population. At first we thought we must have done something wrong, but then the light bulb turned on and we said, well, wait, let’s think about this a different way.

So if you compare these measures and you compare it against ADHD in typically developing control population, and this graph below is worse, that ADHD kids are worse across the board, across all these behavioral measures. Instead, if you try and say, hey, what if I look at a comparison of ADHD control within their cognitive profile or style, what do the results look like then? Then what you see is, indeed, you see some cases where the participants are indeed atypical across the board. But you see other cases, other scenarios where the participants are only unique or different than the control peers in even just one category.

So it told us that a portion of the data of the variation observed across typically developing populations may be imbedded in these kinds of soft, say, communities. It also suggests that the heterogeneity in individuals with ADHD may be nested in this normal variation. Of course, the question was can we know this now and can it predict future outcomes to help us tailor future therapeutics?

I am going to make this really short because of time, but Sarah Karalunas in the lab, followed this up with a similar type of experiment using temperament measures, in this case the TMCQ. Something happened here with the numbers, but this is a similar size sample. What she found is essentially three different groups. One of these groups was a negative emotion group, a group that had lots of negative emotion. It was very different than folks who just had purely ADHD type symptoms and all types of surgent type behaviors. But the negative attention group, if you identified them at age eight with being in this particular subgroup, by age nine almost fifty percent of them had a new onset disorder. The follow-up of these data over several years have shown that this has been highly consistent across time.

So there is some evidence that there may be ways that you can use this heterogeneity to start predicting some outcomes. The question, of course, I always get asked was why did you not try this in the brain? The reality is we did. We played with it many times, but the reality was that the measures were not quite reliable enough to actually get something consistent replicated across different types of samples. We think now that this potentially changed with some of the new techniques we were applying in understanding of our limits of some of the brain imaging data.

So here what we are going to do is we are going to repeat a similar type of experiment. In this case we are going to take those same three measures, the general ability, executive function, working memory and the emotional dysregulation – the various letters, ABCD – and we are going to try to do something quite similar. We are going to use the ABCD data to train the models to get a brain measure, the polyneural risk score of each participant in a separate sample and in the OHSU data set, and then we are going to apply them against the ADHD symptoms in our ADHD 1000. Then we are going to ask the question can we now identify subtypes in OHSU data set using brand imaging data and determine whether there is any group differences between communities, trying to parallel some of these early findings.

So here what we are doing is looking at the top features. Here is a null model if you just add features kind of randomly instead of based on the top features. And here is the explained variance. This is our general ability against our ADHD symptom scores. What you see is that you can begin to explain – this is in the third sample. But you can begin to explain close, not quite, but essentially that is 2.72 percent of the variance at a specific number of the features. Here is the actual data in the ABCD sample. The green is the controls, the black is subthreshold, purple is ADHD, and then pink is not clean one way or the other.

What you can see is this relationship, and again almost three percent of the variance can be described using these measures, just based on this general ability score from ABCD.

These are the Manhattan plots. You can begin to see which networks or which interaction of networks are related. Here, the only reason I put this up on the slide was to show some of features are distributed across the brain, but we start to see, at least in this case, that there do seem to be some more specificity of given networks that are relating to symptoms as opposed to just the general ability in general.

Here is the executive function map. Similar type of result, except for much less features. It is going to maximize the presenting variance explained. Here you are maxed out at two percent variance explained of the ADHD symptoms. Here is again the Manhattan plots and the brain features. Here is our learning and memory. In essence to maximize the variance explained, you need almost all of the data. It explains much less than one percent of the data for some of the variance. The Manhattan plots of the brain data.

Here, the last is our emotion regulation, which looks very much like the learning and memory. Here you can see that the axis is separate because now you have more emotion dysregulation, and your ADHD symptoms go up as opposed to the other executive function method. Here is the brain data and again the Manhattan plots.

Combining the scores by a regression can actually predict quite a bit more variance. So if we combine these scores, and now we are trying to predict ADHD symptoms, now we are predicting upwards of 7.5 percent of the variance, meaning that the variance explained by these separate components are not completely replicable.

That then begs the question to me at least, are there different participants in here who have different strengths of any one of these different components, similar to what we did before. So now what we are going to do is like the last time, we are going to apply our experiment and instead of looking at differences across everybody and then looking within a different brain profile. Here is our matrix, after applying community detection where you get specific subgroups. Here is our comparison of every kid against every other kid, just like we showed before. We see in this case that down is worse and ADHD kids, as far as their score, are lower, worse, not necessarily lower across all the measures.

And here are different subgroups so we start to begin, even in the brain, start to see similar types of phenomena as we saw 10 years ago, something that we haven’t been able to in 10 years because of the reliability of the data. In some populations, like here in subgroup B, indeed all of the ADHD kids are worse or have lower scores. They are mostly similar, but they seem to be atypical in one measure or the other. In this case learning memory, in other cases in emotional dysregulation and executive function. So the point here is in the era of big data, while it is not necessarily in kind but shifts to new standard practices and analyses. When you build off some of the growing pains in genetics, using things like multivariate polyneural score approaches, they are allowing us to increase power of brain behavior associations in smaller samples. But doing this work is extremely resource-intensive and it is unlikely to be conducted by one group alone, so there is a bit of our culture that needs to be upgraded.

What this does not mean is that small sample studies are not important. It does not mean that all the foundational work with fMRI or functional connectivity is invalid. It also does not mean that effect sizes cannot be improved with improving methods. In fact, that was part of the point of showing some of the early work that we have done in polyneural scores. It’s just understanding the specifics, understanding under which conditions you can be successful with smaller samples.

Today I am mostly focused on the big data and many subjects issue, but if anybody is going to ACNP, I will be giving a talk there in a similar vein but kind of focusing more on the big data within subject characteristics, the work that we have been

So last here I am just going to end a little bit with this SWOT analysis. Again it is just a mental exercise to provide some food for thought. Strengths, weaknesses opportunities, threats.

So some strengths of our field. One is that there is a growing number of scientific disciplines entering the field, new access to big data resources. You have a sort of cross species work to bridge findings to neurobiology and lots of public/private partnerships are growing. Interest from nonprofit organizations. We have some junior investigators that I can see, who appear to be much more open to entertain the weakness and threats to our field than others.

The weaknesses, one I think it has issues that data quality continues to make progress and this is just not functional MRI activity MRI. It crosses lots of different domains, including behavior. The broadening of expertise in disciplines while it is definitely a strength, it sometimes moves the field away from the grounding in neuroscience. Again, that is important if you have really tight collaborations across different levels of expertise. Effect sizes are small when examining complex behaviors. That is something we will have to deal with.

We have some other threats. One is that if you ignore the weakness that is a big threat. We have to embrace the heterogeneity problem. There are certainly lots of people doing that now. Selective reporting and p-hacking is causing us to have a false sense of belief in how big the effect sizes actually are. Our academic infrastructure that rewards selective reporting sometimes. Only positive findings are typically accepted in most cited journals. For example, if everybody published all results, regardless of whether they were positive or not, you likely would understand the true effects. Funding mechanisms are not always set up to optimize study designs for some of the questions you want to answer. The academic advancement culture and promotion is not set up to handle what we are learning about how to conduct the work. Our training environments are not always equipped to train some of the next generation scientists. We have lack of diversity in the subject populations. I highlighted one issue with regard to that, and the same goes for – and are related – the lack of diversity in our investigators.

There are some opportunities. One is that we have opportunity to expand. There are enthusiasm and there are new approaches to improve SNR. We have an open opportunity for open science, data sharing, and resource sharing. Grow support and diversify the workforce. Allowing the growth of Big Data Resources. The post-pandemic, we have an opportunity to widen our reach of the communities we are touching with tele-outreach. And we have an opportunity to leverage the incredible batch of motivated junior investigators who could really “bend the curve” of discovery if we take a lot of the issues that we talked about today seriously.

So I am going to end there. I want to thank you all for having me. I have really enjoyed giving the talk. This is the new state of affairs of my lab picture, a mixture of standalone pictures and Zoom. But again, we have tons of people who work on this work and of course all of the funding that is related to these collaborations, in particular the funding from (audio issue) allowed us a lot of this discovery. Thank you.

JOSHUA GORDON: Than you, Damien. That was really wonderful. We really appreciate your giving us this excellent tour de force of your work, and I personally look forward to hearing what you have to say about deep dives on smaller groups of individuals at ACNP.

There are a number of questions in the Q&A that I am going to give you a chance to answer. I had one that I wanted to take the moderator’s prerogative and ask you first. Really it is just a clarification. In your brain-wide association studies you have these different categories of associations. Is it that you map the brain function onto these different categories first and then ask how much they contribute to ADHD symptomatology, is that how it works as opposed to trying to map the brain-wide associations onto ADHD symptomatology, per se?

DAMIEN FAIR: That is right. We identify the brain physiology related to the component behavior and then use that to generate a polyneural risk score of that particular behavior and see how that relates to ADHD symptoms.

JOSHUA GORDON: I am going to jump forward to a question that pertains to this. I think it gets in some ways that you are probably working to try to improve those signals and noises. In your BWAS – I am going to read the question directly and I might add a little color at the end.

In your BWAS study average effect sizes are very small. I would imagine that many of those brain behavior relationships had no grounded hypotheses. Do you think that using massive amounts of test that have, quote, no theoretical grounding – I am reading directly – are biasing your conclusions for how big real effect sizes are?

I am going to take the liberty to reframe that question as well. Do you think you would get bigger effect sizes if you created better relationships between brain and behavior either through restricting to theoretical grounds, or to actually using functional imaging during those tasks, as opposed to connectivity or something like that.

DAMIEN FAIR: The answer is probably not. I think that you can’t say for sure, because maybe someone comes up with some theoretical behavior construct that is so tight and so strong and so pure that it relates really well to some specific thing in the brain. That could happen, but I don’t think it is purely because there is not a hypothesis for this circuit or that circuit. The effects are going to be largely the same.

Now the question about functional MRI is a good one. Functional MRI – much more of it depends – if it is a repeated measure in contrast, like the psychological destructs of the constructs of yesteryear, now there are much fewer studies like this now, but there used to be a lot. Those have a lot more power because you are getting lots of repeated measures in the scanner. But if you are trying to relate the functional activation to some outside behavior or construct, that is not in that way, then it may be a little stronger. There’s some evidence that maybe it's a little stronger than what you would get with what we are discussing here, but the same general issues will remain.

JOSHUA GORDON: The same issue of needing large sample sizes to be able to make the relationship to things that are not happening when the person is in the scanner.

DAMIEN FAIR: Correct. Again, this is mostly, it is an empirical question, and I have seen some data out there suggesting that it is a little stronger, but it is more digging into the details to understand is it is truly strong or just reflecting the same component behavior that is happening in the scanner. So like if you do an attention-specific task in the scanner, and then you do that same attention task or something similar outside the scanner, then that change will relate to the thing outside the scanner. But it doesn’t transfer to something else, if that makes sense.

JOSHUA GORDON: Here is another question and I think it gets to - it is a similar question that is also asked about GWAS results. If all brain behavior effects are this small, or let’s say these kinds of brain behavior effects, if the brain behavior effects are this small, are they likely to be clinically actionable? Is studying this relationship a good investment for the NIH?

Another way to say this – just like the weak GWAS effects, how do you take these kinds of effects and turn them into something that is going to have an impact on something that I know you care a lot about, Damien, which is clinical care of patients. Getting people better.

DAMIEN FAIR: I am going to answer that question in two-fold. One, we have some sessions on this exact specific topic at Flux this month the 17th to 21st. If you are interested to learn and take deep dives at this, you definitely should come and hear about it.

JOSHUA GORDON: What is Flux?

DAMIEN FAIR: Flux is the Developmental Cognitive Neuroscience Society. The meeting will be coming up very soon. But the answer is yes. I think that science is a journey. I don’t think that if anybody decided that they wanted to quit all genetics because some of the same issues that we had 15 years ago. They are almost identical. Then I think we would have been at a loss for where we are at now for all the different discoveries that are coming out in the last decade.

I do feel that there are certainly ways that we can increase the effects. I think it is more about you have to understand the landscape that you are in to try to make it better. Then you have to understand the landscape to know your limits and what you can ask. The thing is with the polyneural risk scores is again like the effect sizes are much higher than actually what you would expect even in a lot of the genetic studies against these complex behaviors. I feel certainly there will be lots of utility in trying to understand and characterize long-term outcomes, effects of medications, and all types of things like that. So I think that is definitely on the horizon. It doesn’t mean of course we should stop funding it. What it means is we should be zeroed in on focusing on things that are going to be the most effective.

The other thing here is that the whole other half of this talk that I didn’t talk about at all I think is just primed, because now we know the extent of data required to actually get the precise measurement inside the brain of that person at a given time. We are very good at that now. It is clear. The papers are growing exponentially because we finally understand the parameters a lot better. In that sense this is the time you want to start funding this type of behavior, because we are at that precipice where we can really start making inroads that we probably weren’t knowledgeable enough to do even 10 years ago.

JOSHUA GORDON: So I am going to paraphrase one of the other questions. With your permission, Damien, we will go a little bit over, I want to be respectful of your time. The question is about what might be the sources of subject level variability? I think what you just said is there are some of those inter-subject level variabilities, that is because we are not actually designing the studies right. We are not actually getting data that is deep enough that it can be reliable.

Can you also say about what other sources of the subject of variability that you know?

DAMIEN FAIR: Number one, that is exactly right. If you don’t have the extensive data for the data to be highly reliable –

JOSHUA GORDON: By reliability here you mean within one individual it is not reliable.

DAMIEN FAIR: Exactly, but that increased variability will decrease your effects across the sample. But the other part here, and I think it is related again to the stuff I will be talking about at ACNP – Randy Buckner and several other people have really kind of led the field in this. Is that the topography of the brain is not nearly as similar from person to person as we ever believed? So if the topography of the brain, exactly where these systems land inside an individual’s brain, is not in the place that you think it is based on neuro anatomy, that by definition will decrease your effect size because you are going to have lots of noise across different system. There is now beginning to be more evidence of it, that is necessarily true. So that is another source of noise.

The other one of course is that your outcome, the other measure you are relating again. The assumption is in this case I am doing executive function or measures like IQ, which are extremely complex. If you think about all the different types of processes. The brain doesn’t work in a compartmentalized way like we think about behavior. So it is the combination of all that. So those measures cause lots of noise, even on the brain data itself. Even if you have highly reliable brain data, if your other measure is not highly reliable then you hesitate.

So again, it is all hands-on deck to really how do we think about this. Of course, I am a brain imager, so you are hearing about the specifics about brain imaging (audio issue) imaging problem, per se. It is really how we do the science and think about it problem.

JOSHUA GORDON: Another question. Can brain wide association data in an ADHD population distinguish between responders versus non-responders to drug treatment.

DAMIEN FAIR: I absolutely believe that to be true and I will tell you in about six or seven months.

JOSHUA GORDON: We will wait for it. One more question, if you don’t mind. I think it is a good one that will allow you to talk about a lot of other things as well. Again, I will just read it word for word. Large data allows moving beyond linear models. For instance, you could test models that assume the brain to be a complex system that may move from one discrete state of functioning to another because of the small variations in basic processes. Are you developing or testing such models with your data?

DAMIEN FAIR: Yes, absolutely. This is like the dynamics. The issue of dynamics at least that I have seen in the literature so far, is it is also extremely difficult to do and is very sensitive to noise properties that are also temporal, like motion and other non-biological phenomena. You have to be extremely careful to avoid getting fooled in those type of studies, but I would say I do think that the field is primed to deal with that as well, because we are just getting so much better at being able to handle those types of issues.

JOSHUA GORDON: All right. Thanks. I was going to cut it off there but this one I thought you might have some sage advice on. Would you expand on the threat of funding mechanisms not being well suited to encourage optimizing study designs? I am sure that program officers and other staff here would love to hear about that.

DAMIEN FAIR: We just need more money, Josh.

JOSHUA GORDON: You are sounding exactly like a geneticist.

DAMIEN FAIR: What I would say is the typical RO1 model is for isolated study, for the majority of types of questions that people want, it wouldn’t be difficult to get samples sufficient to answer the typical types of questions that people ask, I think. I can’t say than with a 100 percent certainty because we are in a somewhat limited scope here with what we are looking at.  But that is my sense.  So like collaboratives, linked RO1s, things that are less common but are likely more amenable to multiple places to building data sets for these kinds of BWAS type studies, I think will be more fruitful in the end.  But RO1 level studies are indeed good for other types of questions, particularly questions with regard to imaging within study designs typically have a lot more power. So it is more about what are the questions that you can answer with this model and what are the questions you can answer with this model. You have to really think through it to be maximally effective and to move the field forward.

JOSHUA GORDON:  Damien, again I thank you very much for a great talk and for bringing to the field of imaging the same kind of respect for variance that we have learned we need to apply in genetics, and I really look forward to seeing your work evolve over time.  Thanks a lot.  Thank you everyone for coming.