>>Good morning, everyone, thank you for joining
us today for the NCI CBIIT speaker series. I’m Tony Kerlavage, the director of CBIIT. Just want to remind everybody that
today’s presentation’s being recorded and it will be made available on the
new datascience.cancer.gov website. You can find information about future speakers
there and also by following us on Twitter at our relatively new Twitter handle which
is NCIdataSci, that’s NCI, d-a-t-a, S-C-I. Today I’m very happy to welcome Dr. Peter James, who is professor at Harvard Medical
School’s department of population medicine. Dr. James has almost a decade of experience
working with large perspective cohort studies, including the nurses’ health
study, the Framingham Heart Study, and the Southern Community Cohort Study. He’s aided in the creation of many geographic-based variables
and linked them to health data. And the title of his presentation
today is spatial factors in health, the impact of natural environments on health and
future directions in mobile health technology and deep learning for epidemiologic research. And with that, I’ll turn it over to Dr. James.>>Great, thank you so much. It’s a pleasure to be here,
and I’m happy to talk about my research on spatial factors and health. Just to get us oriented, what
I’m going to do in this talk, I hope to kind of lay a little
foundation about why it’s important to study spatial factors in health. Next I’ll delve into an example
of natural environments in health, some of the research we’re doing
right now is pretty exciting. Walk through a quick research
highlights and then really delve deep into the future directions in spatial
exposure and behavior assessments that I think will be really interesting to a
data science crowd, and then I want to finish with just a brief piece on translational work. So, my research focuses on how
the places in which we live, work and play influence our health, so where you
live might influence whether you walk, or drive, or bike to work, your exposure to air pollution
along the way, the people you may interact with along the way, whether you have
access to healthy nutritious foods, or whether there’s a safe and inviting place
for you to play and be physically active in. And so when we think about these
spatial factors, there’s a growing body of research linking spatial factors to
physical activity, sleep, and obesity, which are major contributors to
chronic disease risk, including cancer. And I think it’s important to consider
that these are ubiquitous exposures that I’ll be talking about, this is
not a rare chemical exposure, you know, this is the built environment, the
national environment that, you know, air pollution that everyone is exposed to some
level of, so it’s really ubiquitous exposure. And I think another really important thing
to consider is, you know, I have this picture of East Boston, looking back at Boston, this
statue saying, “A city is not an accident.” Every single thing that you look at
in your environment after you walk out the door is the result
of someone’s decision. Whether that’s a policy decision,
an economic decision, these environments are constantly
changing and evolving. And the decisions that we make are
modifiable and also have long-lasting affects. So, when you change an environment, when you
build a building, when you put in a road, when you build a park, often those factors stay
for generations and people are exposed to them. So, these are really important decisions
to consider as we craft our neighborhood. And so in terms of spatial measures that
I’ve looked at, we’ve examined, you know, neighborhoods, there’s economic status
in health, looking at urban sprawl or neighborhood walkability, food
environments, a number of different factors, but I want to dig a little
deeper into natural environments. So, when I talk about natural environments,
I like to start with this slide of Yosemite, which really I think captures
this kind of face validity, of the idea that there might be something
about nature that stirs something inside of us. And this biologist EO Wilson coined
this term biophilia, that we’ve evolved with nature to have an affinity for nature. I think it’s pretty evident when
you think about real estate, and you think about what’s the most expensive
office would be with a view of Central Park or of another park, so these are
definitely something we value, but do they play a role in health? That’s the question. So, what are the kind of purported mechanisms through which natural environments
may affect health? Well, one would be, you know, I
have in this slide air quality, but truly it’s protecting us
from environmental hazards. If, you know, in a greener environment
maybe levels of air pollution are lower, or maybe extreme temperatures
may attenuated, or even noise, so it may buffer you from exposure to noise. So this idea of perhaps green spaces buffer
us from harmful environmental exposures. Next, green spaces may provide a
location for us to be physically active, they also may provide an environment
where we may be more likely to interact with our neighbors, so increasing
our social engagement, which has been linked to health and well-being. And finally, this kind of direct pathway, this
idea of actually just maybe viewing green space or being in green space may allow us to
restore our tension and to decrease our stress. And so those pathways are kind of
thought of as the dominant pathways that may drive the relationship
between nature and health. So, in my work we had thought that this,
you know, it’s a really interesting area of research, but there were not that
many studies looking at the relationship between greenness and mortality, so we
chose to do an analysis where we looked at the relationship between greenness and
mortality within the nurses’ health study, and I’ll explain a little more about this study. So, first of all, the nurses’ health study is a
long-running prospective cohort started in 1976 with over 121,000 female nurses, originally
they were from 11 states, but as you can see in the map on the right, these are the
geocoded addresses of all the participants in the nurses’ health study,
they’ve moved throughout the US. And truly they cover pretty much every type of
geographic area that people live in in the US, and you may say that in the center of the
country there aren’t that many participants, but the truth of the matter is that there’s
not that many people who live in that part of the country, so we really do
cover the predominant majority of geographic areas in the US. These participants fill out questionnaires
every two years on health behaviors, on diseases that they may have acquired, and we follow up with the National
Death Index for mortality data. So, as I said earlier, we have geocoded
the participants’ addresses from 1986 to the present, and we use
those geocoded addresses to link environmental covariance
to these participants. And so for this study, what we did is we
took advantage of the fact that, thankfully, the government has invested
in lots of satellites that orbit the globe and take images over time. One of those satellites is the
MODIS satellite, which takes images at about 250 meter resolution every
16 days of every surface of the globe. So, what we could do is we could take
MODIS data dating back to the year 2000 and link that to our participants. And it turns out that vegetation scatters
near infrared light to keep from overheating, and absorbed visible light for photosynthesis. So, it gives off this unique spectral signature,
which we can measure with this equation, the normalized difference vegetation index,
which gives an objective quantified measure of vegetation in every 250 meter
pixel from every one of these images. So, what we did is we took seasonal measures,
so every three months from 2000 to 2008, we took the MODIS data and we linked
it to our nurses’ residential addresses to create a time-varying measure of exposure
to greenness around their home for each season. And so what that looks like is
we take a hypothetical address, we overlay the spatial data set, and
we can extract a value for greenness or NDVI for that participant in the pixel in
which their home falls, or we can create buffers and create an average exposure
over, for instance, a 1250-meter area, which
we did do in this paper. But, findings are fairly
consistent for different exposures, and this is basically the main
finding, to orient you, this slide, this is we divided our exposure
metrics into quintiles, to basically quintile five is the
participants who lived in the greenest areas, and we looked at kind of a monthly analysis
where we were looking at their mortality rates, and we found that participants who lived in
the greenest quintile, the top 20 percentile, they had the lowest mortality
rates over follow-up, so about 12% lower hazard rates
compared to the least green quintile. And it’s important to note that these
analyses were adjusted for a lot of individual level socioeconomic markers,
as well as area-level socioeconomic markers, so we do think that this is
independent of socioeconomic status. I also, especially for this audience,
should note that these findings, you know, this is all cause mortality, but we did see that the associations we observed were
driven primarily by cancer mortality and respiratory mortality, so I’ll explain a
little further how we’re following up on that. Next, we also ran some stratified
analyses, so we looked at– this is a stratified by physical activity
level, and we found that participants who were most physically active
seemed to get the biggest, you know, the strongest relationship between greenness
and mortality, which kind of makes sense. So, we are seeing that the people who
are physically active may be those people who are most likely to be out and
interacting with their green spaces. So, certainly worth following up on, and I’ll
explain how we’re trying to do that soon. We also looked at mediation, because in the
nurses’ health study we’ve measured those pathways that I discussed earlier, you
know, physical activity, air pollution, social engagement and mental health, we
were able to kind of test the strength of these pathways in terms of explaining the
relationship between greenness and mortality. Interestingly, physical activity did not
explain that much of that association, about two percent, air pollution
exposure was about four percent, and we found it pretty interesting
that social engagement, which we measured through participating in
groups greater than weekly, explained about 19% of the association between greenness and
mortality, and then mental health is pathway of kind of, as we measured it, doctor-diagnosed
depression or antidepressant use, which is a pretty strong measure of health. We found that it explained about 31% of the
relationship between greenness and mortality, so we are following up on that finding as well. So, briefly, you know, there was
some good strengths for this study, it was a large nationwide prospective cohort
study, at the time it was one of the biggest, and there was time-varying
exposure data, which was unique. We also had the ability to look at mechanisms
through which greenness might affect health with our mediation analysis,
so that was helpful. But some big limitations, you know, first of
all female nurses, they’re primarily white, which limits the generalizability
of our finding, but I think probably the most limitation is is
greenness around the home the right measure? So, first of all, we had no information
on how participants interact with nature, we just looked at their home address and
looked at the environment around their home, and second of all, we have no information
on the specific features of greenness, so we just had a quantified measure of
vegetation, but we don’t have information on what specific features might drive health,
so that’s really important to think about and that’s exactly what we’re trying to
address in some of our future directions. So, this study actually got a lot of press,
and I think the reason is that it’s kind of a positive amenity for health, something
modifiable that can benefit health, as opposed to a lot of the kind of
scary things you read about in the news. So, it’s been exciting that people
are really interested in this topic. As we follow up on this topic, we’ve
done studies where, for instance, we looked at doctor-diagnosed depression
and antidepressant use as an outcome, and we do find that there’s a
lower rate of incident depression in those participants living in greener areas. Next, we actually looked at kids. In the study of children, we found also that
there’s a lower incidence of depression in kids in the Growing Up Today study, 11,000 children,
and we also saw that this relationship seemed to be even stronger in participants
who lived in dense areas, so high population density urban areas. We’ve also looked into this idea of incident
breast cancer, so looking at whether people who live in greener areas have lower rates
of breast cancer, and actually this is data from the nurses’ health study two, which
is a totally group of people than were in the first study that I
mentioned, the nurses’ health study, and we are seeing that participants who live in the greenest areas have lower
incidence of breast cancer. So, just to take you through a
few other research highlights, when it comes to data science I think these
are some interesting spatial data sets that we’re working with now, so
hopefully you find them interesting. We’ve been looking at noise, noise is thought to
act through this general stress model and even at levels that you don’t perceive or
don’t disturb you from your sleep, they may affect cardiovascular
disease, potentially even cancer, but we’ve been looking at
cardiovascular disease. And we are finding using this national park
service noise model that participants living in areas with higher levels of
nighttime noise have higher levels of cardiovascular disease incidence. We’ve also published on Light at Night from
NASA satellite data, where we’ve thought that exposure to higher levels of
light at night may influence melatonin, may influence circadian disruption, which
has been linked to breast cancer risk. And in fact, we do find that participants
who live in areas with higher levels of outdoor light at night do have a
higher incidence of breast cancer, and this is after adjustment for air
pollution, [inaudible], socioeconomic markers, population density, so really
after accounting for a lot of other spatial factors we’re still
seeing that persistent relationship there. And then we’re also working with the EPA
to improve some of our spatial metrics, so this is that NDVI metric from
MODIS, the 250 meter resolution data, and EPA has taken aerial photos and
actually created one meter resolution data. So perhaps with improved exposure
assessment we can see stronger relationships than we were observing with our kind of
crude measures with 250 meter resolution. But, you know, I’m here talking to you about
data science, and I really want to talk more about the kind of future directions that we’re
moving in that I think are really exciting in terms of spatial exposure
and behavior assessment. So, first of all, all the studies
that I mentioned prior were focusing on the home environment, but we know from time
activity studies that people spend less than 50% of their time at home, so this could lead
to massive exposure measurement error, which could bias our findings
and lead to incorrect conclusions about spatial factors and health. And so to address this, we’ve been
working within the newest iteration of the nurses’ health study, so this
is the nurses’ health study three. It’s an open cohort of actually about 47,000
nurses now, so we’re still recruiting, and if you actually want to be or know a
nurse and you want to be part of the study, you can go to nhs3.org to sign up. But these are nursing students and nurses who
are ages 19 to 46, based across the entire US and a few in Canada, as you
can see from the map. They fill out web-based questionnaires
online every six months, and we noticed that they were
opening their study emails, our participants were opening their study
emails predominantly using their smartphones, so we thought maybe this is the ideal cohort
for a study involving mobile health technology. So, we now have an NCI-funded, what we
call, the nurses’ three mobile health study. We’re piloting smartphone application, a
custom smartphone application we developed, and then consumer wearable devices,
or Fitbit’s, within the cohort. So, what we do is we’ve asked
about 500 participants to undergo seven-day intensive sampling
periods for four times across a year, so four seven-day sampling periods across
a year spaced around three months apart. And so what we do is we send our participants a
Fitbit device, which measures steps, heart rate and sleep at the minute level, and then we also
ask them to download a custom smartphone app that measures location and also
administers very brief questionnaires. So, what this does for us is it
allows us to gather efficient, passive and objective low-cost measurement
of high spatial and temporal resolution data on cancer-related health behaviors, and I’ll explain what those data
kind of look like in a second. But really, it also gives us the
power to deliver questionnaire and to receive sensor data directly in near
real time, so it alleviates the prior studies of this type that would send an
actigraph or an accelerometer to somebody, hope that they wore it, and
then ask them to send it back. We can get the data basically anytime
they are connected to the internet. And so in terms of our smartphone
app, we’ve done a lot to make sure that our participants’ data is secure, that they
understand when they’re sharing data with us, so in terms of the consent
process, and beyond that even in the app they can turn off any
stream of data they want at any time. So, we let them know when we’re
going to be sampling their data, and we make sure that they know that
they’re sharing their GPS data with us and that they’re sharing
their Fitbit data with us. I should note that there are a number
of commercial apps that are out there that are gathering these data when, you know,
participants or– I shouldn’t say participants– when consumers are really not aware of it. So, we really want to be upfront and
transparent with our participants so they know the types of data we’re gathering. So, we are actually gathering high-resolution
location information at the minute level, so we do ping their GPS data during
their sampling periods every ten minutes, so we’re getting basically a measure
of where they go in near real time, so whenever they sync their phone. It’s actually higher quality than satellite
GPS alone, because it can take advantage of the satellite but also triangulation from
cellphone towers, and Wi-Fi signals also help to assist with the location services. The app actually is coupled with the Fitbit API,
so it centralizes access of the questionnaires from the app, location data from the
app, and actually the Fitbit data, so on the participant phasing side,
they can actually give us permission to access their Fitbit data directly through
the app, and so it’s seamless for them. We’ve also been collecting smartphone
accelerometer data, so the actual movement from the phone as opposed to
the movement from the Fitbit, to see whether a smartphone-only approach
would give us pretty much equivalent data from a Fitbit protocol, which
obviously Fitbits cost more money than using a smartphone app,
so we’re exploring that. And as I said before, we have a huge
focus on data security and transparency for our participants, so they know we’ll
never sell their data, give anyone who’s not on our ROB access to their data, and all their
data are always living behind a secure firewall. So, in terms of Fitbit devices, these consumer
wearable devices, such as the Fitbit Alta HR, the Fitbit Charge 3, can measure
physical activity, heart rate and sleep at the minute level at a fraction of the cost
of research-grade devices, so these devices cost about half as much as an actigraph, and so
we can scale up our studies a lot better because we can afford more devices. These wearable devices communicate with
a smartphone and securely upload data in near real time, so again, it alleviates
the need for sending devices back and forth between participants and researchers. Prior studies had shown that maybe up to
11% of devices are lost in that process, so it’s a huge cost for lost
data in studies of this type. But I think the most appealing thing
about Fitbits is that participants like using consumer wearables, in fact
many people buy consumer wearables with their own money because they’re
interested in their data, and they may be likely to wear consumer wearables
for long periods of time. So, there was a study that showed that 58%
of users only wear the devices six months, and I think it’s really funny when I read
something like that that sounds negative, I mean, if we had that type
of adherence over six months, it would be one of the most successful
studies we’d ever thought of. So, you know, I think these types of
devices are potentially sticky enough that they can give us long-term behavioral data. And so, a few pros and cons for using
consumer wearables within prospective cohorts. So, first of all, there’s been
multiple validation studies that show acceptable performance
for certain metrics. So, physical activity, heart
rate, seems to perform pretty well in independent validation studies when compared to either an actigraph accelerometer
or other gold standard measures. For sleep, these devices have the same
limitations of any movement-based device, they don’t perform as well as polysomnography,
but they might give us certain metrics that are acceptable, such as total sleep time. They have fairly good battery life, five to
seven days for most devices, there’s low cost and low burden for participants and
researchers, which is a huge perk, especially when we’re trying to
scale up to some of our cohorts. And I think it’s important to remember they’re
easy to use, they’re liked by participants, they’re sticky so they’re much easier
to just provide to a participant without detailed instructions, which
makes adherence much more likely. But there are some huge cons
that we need to consider, so first of all these are proprietary
algorithms, there’s a black box. So, we don’t know exactly how
Fitbit is processing their data, so it’s going to be harder to share,
you know, pool our data across studies that use different types of devices. In addition, the algorithm can change at any
time, so we may think that we see a change in behavior, but maybe it’s because
the algorithm has changed over time. These devices also were designed
to alter behavior. Now, I don’t know that they have been proven
to be that successful in altering behavior, but this is a potential problem
if you are looking at a study where you wanted the absolute measure
of physical activity, for instance. For us, we kind of want to compare within
people, so if there’s a slight change of behavior, it may not be of
consequence for our studies, and I’ll explain a little bit later why
we don’t think that’s such a problem. Fitbit also is constantly releasing new
devices and new features with unknown validity, so we have to keep up these validation
studies, you know, about every year right around Christmas there’ll be a new device
coming out, so we constantly need to ensure that the validation metrics persist over time. And then how sticky are these measures? So, if we’re trying to look at, for
instance, cancer related health behaviors, how long do we think participants
will wear these devices? Is it possible to measure physical
activity over years, we don’t know that yet. And so, here’s a little figure that shows
what some of our participants’ data look like. I should note, this is actually my data, this
is not a participant’s data, but to orient you, the purple dots are heart rate at the minute
level, the green dots on the bottom are steps at the minute level, and then that horizontal
line is latitude, which we’re using here as just kind of a proxy for
are we getting location data. So, you can see the kind
of richness of this data. When you compare it to what we measured prior,
which was a questionnaire on physical activity in the past year, you know, now we
have all new insights into variability, timing of physical activity, you know, the
kind of stability of some of these measures, and looking at some of these peaks where
you can identify perhaps a specific bout of physical activity. We can also pretty easily based on heart rate
identify sleep periods when that heart rate dips down and, you know, perhaps there’s
some patterns here that we can look at over longer term studies to
really understand how different types of sleep are related to health. And in terms of sleep in circadian markers,
here what we have is a 24-hour clock where each colored line is
heart rate on a different day, the thick blue line is just an average of
heart rate across all the days in the week, and then those gray shaded areas
are the reported sleep periods. And actually, this is my data when I
traveled to California, and so you can see as my sleep period shifts as
I acclimate to the time zone. So, some interesting insights that we can see in
data that we could never do from, for instance, a self-reported measure of sleep that
averaged across a year of, you know, an average how many hours
do you sleep per night. So, with these data, you know, we can compare
them to those questionnaire-based approaches and see how consistent they are, we can look
at variability and sleep timing, duration, we can derive circadian markers, so what
are the peak times of behavior during a day, and what is your kind of
lowest time of movement. We can also create novel metrics such as
social jetlags, is there a big discrepancy between your sleep patterns during
the workweek versus on the weekend, and if there is that disconnect, perhaps you
have some social jetlag, as you kind of have to, you know, change your sleep patterns
to accommodate your work schedule. And, you know, it’s important to note that these
are nurses, so we also can identify shift work, and we can see how shift work may affect
behaviors and ultimately help outcomes with a lot more resolution than we
were able to with a questionnaire. But ultimately this is kind of the
data that I’m most interested in. As I told you, I study spatial factors in
health, and what this is is a schematic where we have the GPS points as a
participant moves through space. They’re color coded by activity level, so the
red is sedentary, yellow is light activity, green is moderate, and blue is vigorous, and
then we overlay those points with, for instance, a spatial data set on greenness,
so this is that NDVI metric. And so from these data, just looking at one
participant we can identify, for instance, that there was about a moderate to vigorous
physical activity on the right-hand side there, and we can say, oh, was this
participant more active in greener places, or less active in greener places? So, we can identify and test
some of those hypotheses. And so we’ve done that with some of our
previous data where we’ve published noting that participants had higher activity
levels at the minute level in greener areas, so we did see, based on actigraph
accelerometers and satellite-based GPS devises, that our participants were more
active in greener locations. And then we can couple that with other spatial
data sets, so this is a three-dimensional spline where we are seeing that kind of still,
that kind of non-linear relationship between greenness and activity on
the left-hand side of that cube. But we also can look at walkability,
so neighborhood walkability, whether that’s density, connectivity
of streets and access to stores, and we see that actually people are most active
in dense walkable areas that are also green, so that provides even more insight
into the types of environments that might provide the most opportunities
for individuals to be physically active. Okay, so I’ve talked a little bit about
adding high resolution location data on where participants spend time, and
also increasing our objective data on participants’ health behaviors,
but can we also use data science to improve our exposure assessments? So, this is an area right now that we’ve been
writing grants and developing some expertise in, in terms of deep learning approaches for images. So, satellite-based data tell us very little about the specific spatial
factors that drive health. As I said earlier, the measures like NDVI just
give us an objective measure of vegetation in a given area, they don’t tell us anything
about what type of vegetation that is. And so ubiquitous geocoded street-level
images, such as Google Street View, could provide insight into
specific spatial features from an on-the-ground perspective
of an individual. So, it’s also important is this is
actually as you would view it as opposed to a satellite looking down on your home. And then machine-learning
approaches are really important to actually make sense of those images. So, you can use machine learning
to classify images or segment specific components
of an image for an analysis. And so convolution neural networks are
one approach where you can take an image, take an annotated version of that image and
train an algorithm to find parts of that image and classify or segment out
specific factors in that image so that you can know what you’re
looking at from any image. And really, with these types of algorithms
that have been developed and really, I wouldn’t say perfected, but they’re
really, really coming far along, we can really take any image and provide some
data on how we live and what we’re exposed to. And so when we take our old approach, which
is the satellite looking down on your home, we can move from there to maybe more of
what’s in the viewshed from around your home. So, this is a Google Street View image, on the
right-hand side we kind of extracted out all of the information, I should say this
is Perry Hystad’s group at Oregon State, has extracted out all of the
information on greenspace in that image. And so what can you do with that data? So, you could potentially estimate the
percent of greenspace within the viewshed, or you could estimate the percentage of trees, if you extract out information
on trees versus grass. And it provides independent information
when compared to satellite data. So, it’s not the same as NDVI,
actually studies have shown that these Google Street View images
provide independent information, so these might be even more
important in terms of driving health. And, you know, down the road with time we
could trade algorithms to estimate tree species or even identify park amenities, and really get at what specific features are
most closely related to health. And right now we are in the
process of writing grants where we’re proposing to
use the PSPNet algorithm. To orient you to this slide, on the
left-hand side we have some raw images, they were annotated in a competition
in the middle, so somebody went through and actually assigned each
pixel to a certain class, a car, street, sidewalk, tree, you know, sign. And then on the right-hand side we have the
PSPNet algorithm of the computer’s algorithm and how it predicted what each pixel is. So you can see it performs pretty well, and in
fact for greenspace, for trees, and flowers, and grass, it performs over 90% accuracy. So, we think this would be a really,
really great algorithm that we could apply to Google Street View images around our
participants’ homes to get specific information on components of the natural
environment that might drive health. And so there’s further potential for data
science in prospective cohort studies, so we could merge these machine learning
derived metrics of spatial factors with our smartphone GPS data to
create personalized exposure metrics, and we’re hoping to do that down the road. We can also use smartphone apps to
conduct ecological momentary assessment, which they should be pushing people questions in
real time, and we can get repeated self-reports and symptoms affect mood, or even validation
of behavior like diet or physical activity in real time, so we actually get the
information with a time stamp and a geotag so we can really understand how they’re feeling and they’re behaving in their
natural environments. We can also use sense on the phone, so I
talked about phone-based accelerometry, so we can use the phone’s accelerometer
as opposed to just a Fitbit, we can have participants take photos of food and process those photos using machine-learning
algorithms to understand diet better. We can also measure light and sound from the
phone, for instance if we were interested in the sleeping environment and knowing whether
light at night from the outdoors is penetrating into the sleep environment or whether noise
from the street is coming into the home. And then, you know, all other spatial data
sets we can use food environment data, so the location of fast food establishments
or supermarkets, air pollution metrics, we can link all of those to the
location-based measures from a phone to create personal minute-level
exposure metrics. And then JP Onnela at Harvard School of Public Health has coined this
term of digital phenotyping. So, this is the idea of taking all of these data
from smartphones, you know, your phone usage, your physical activity data,
etcetera, and using algorithms to derive specific behavioral patterns
that might be important drivers of health. And so I think mobile health technology and data science really can
provide unprecedented perspectives on exposure and behavior. It can break the dichotomy between
small precise studies of like ten people where we’ve measured everything,
and then our large cohort studies like the nurses’ health study where we have
hundreds of thousands of people over many years but we use questionnaires, and so there’s that
kind of embodied error in that questionnaire. So, I think the mobile health technologies and
data science allow us to kind of bridge that gap and get at high resolution
data in large populations. And I think this has some great promise
for advancing epidemiologic research on spatial drivers of health behaviors,
obesity and cancer risk across the life course, so it’s really an exciting area to be in. And so I just also want to end with
a little bit on translational work, because we don’t do this work in a vacuum,
and so I’ve been really fortunate to work with Boston’s regional planning agency,
the Metropolitan Area Planning Counsel. This is, you know, an organization
that works with communities to work through development decisions, planning
decisions, you know, transportation planning, all sorts of different actual changes to our
built-in natural environments on the ground, and we’ve been fortunate to actually
develop a public health division within the Metropolitan Area Planning
Counsel so that we have a seat at the table when decisions are made that people think
may not be a health related decision, but actually do impact health down the road. So, these ideas of changing
parks or changing neighborhoods, these changes do have downstream consequences
for health, and the research shows that. So, we need to kind of sit at the table
with these change agents and stakeholders and policy makers and help them to understand
how their decisions could impact health. And so we’ve done that primarily
through health impact assessment, which is one tool of taking a policy
or planning decision and trying to estimate what the health effects will be
based on the literature of that decision. So, one of the kind of most popular things
that we publish as our public transit system, which we call the T, the MBTA, was
proposing service cuts and fair increases, and so we took some of the data that they
had modeled on how those service cuts and fair increases would affect
health, and we said, you know, well that will change people’s physical
activity because they’ll be driving more instead of walking to public transit, you know,
it’ll change air pollution levels, it’ll change carbon emissions, you name it,
and we walked all those relationships forward to health outcomes and ultimately to a dollar
sign, and we found actually that the amount that the region would save in terms of
service cuts and fair increases was dwarfed by how much it would cost them in terms
of health consequences down the road. So, I think it’s important when we do this
research to translate it and to bring it to our community and to our stakeholders
to really empower them to make decisions that benefit health down the road. So, some questions moving forward, so
we’re still working on lots of this work and it’s all influx, but first of
all can we identify the mechanisms and isolate the causal relationships for the
relationship between natural environments and health, there’s still a lot of work to
be done there, how can we measure the role of spatial factors in driving health
behaviors, so we’re working on that actively. Can we effectively use technology
and data science to elucidate how specific
spatial factors drive health, and then how can we provide change-agents with
data, information and tools to create healthy, equitable and environmentally
sustainable places? So, still a lot of work to
be done in all these areas, and I hope that today I’ve given
you some insight into the types of research we’re doing on these topics. And I just want to really quickly acknowledge
all the participants in the nurses’ health study and ask, you know, if you know a nurse,
as them to go to nhs3.org and sign up for the nurses’ health study three. Thank you to all my collaborators, and
thank you for the National Cancer Institute for funding a lot of my research. So, I’m happy to take questions.>>Great, let’s thank Dr.
James for a very intriguing– [ Applause ] So, the floor’s open for questions here in
the room, if you’ve got one raise your hand, and if you’re on the Webex, use the raised
hand feature and we will call on you as well. Joanne?>>Thank you for the talk. I was wondering if you were thinking about
using the voice recording to look at issues like social isolation and evaluate the
qualitative nature of the relationships that the phone might overhear, or even
get something like a neighborliness score of the neighborhood of their conversations
with different folks in their lives.>>Yeah, that’s a great question. So, I mentioned digital phenotyping. So, JP Onnela is a biostatistician at Harvard
School of Public Health who’s done a lot of work actually with mental health researchers
using his app, which is called Beiwe, and it kind of runs in the
background of your phone. I don’t know that it uses– we’ve had
conversations about the kind of signature of your voice, so actually measuring your
voice and deriving metrics from that. I don’t know that we’re pursing that as much
lately, but we are pursuing, for instance, the frequency of text messaging, the
speed at which you type, you know, different metrics from phone usage that
might provide some insight into kind of social engagement, social interactions. That’s a really interesting point, though.>>Yeah, so to what extend is your data able
to account for potential individual differences that may lead individuals to
seek out greener environments? I know that to some extent SES is going
to be related to their mobility in terms of where they’re able to choose
to live, but I was just wondering if you could speak more about that.>>Yeah, so this is a methodological term
we call select [inaudible] mobility bias, which is a wordy way of saying that certain
people may choose to go to certain environments, so it’s not necessarily that greener
environments are causing them to be more physically active, but that something
about that individual makes them want to go to a green environment and that’s where
they obtain their physical activity. And so we are kind of at the infancy stage
of understanding how to account for that through either study design
or analytical design. So, one way is to kind of understand
what we call like anchor points. So, looking at home environment, work
environment, places that you kind of have to go, and see how environments that are on
that pathway might related to health. So, for instance, if you’re going
to drive to Vermont from Boston, we wouldn’t say that the green
mountains have caused you to be physically active, you
made a choice to go there. But if it’s something that’s on the path
between your home and your work that seems to be consistently related to you being
physically active, maybe we have a stronger case for saying that that is driving
behavior based on proximity. Does that make sense?>>Yes, thank you.>>Peter, I have a bit of a
follow-up question to that. It struck me that the one slide you showed, I think it was when we were showing
the satellite image and then went down to like one meter resolution. It jumped out to me that it seemed like
there was a river going through the middle of that landscape, and it was
a very low greenspace, right? So, it got me thinking about is
it really green or is it nature? Are we talking about the mental health social
interaction, increased activity benefits of nature, or specifically greenspace where there might be other factors,
even chemical differences like–>>Exactly.>>Perhaps elevated oxygen levels, or whatever. And so I wonder if you dealt with
that as opposed to like, you know, you could have a dessert
landscape which is nature but not green, does that have similar effects?>>Yeah.>>Or a very mountainous, treeless
region which again also nature but not– So, could you address that?>>Yeah, for sure. So, that’s a really important point, and that’s
kind of why, first of all I like to use the term “natural environments” as opposed to greenness. We are measuring greenness when we use NDVI,
but greenness may not be the important factor. And so that is one of the reasons why we’re
trying to develop new spatial data sets that capture different features
of natural environments. So, for instance, water, what we call
blue space, may be an important driver, and we have actually published on blue space. And folks from IS Global in Barcelona have
done a lot of work looking at blue space, my colleague Dan Krause in Canada has done
some amazing work looking at blue space. So, we’re not exactly sure what the biggest
drivers are in terms of natural environments and health, and so we are exploring
all these different features. And I think, you know, that’s what kind
of led us to the Google Street View images or any street-level images
where you can actually extract out the specific features that
might be related to health. And down the road, you know, maybe
it is a question of measuring kind of from a more traditional environmental health
perspective, measuring some sort of, you know, is it pollen, is it some sort of chemical
in the air that people are breathing that might be related to improved
health outcomes, we’re not sure, but work is just beginning in this area, and I
think we kind of had that approach of looking under the street lamp because NDVI, this
measure, is so easy to obtain and link to participants, and so that’s where
the majority of research has been, but I think we need to move to better metrics
that tell us the specific factors at play.>>Great, thanks. We’ve got a few questions on the Webex.>>First one is from Daoud Marzaman [phonetic]. Daoud, you’re unmuted.>>Yes, hi, thank you, great talk. A couple of quick questions, actually, it
was mostly related to the wearable devices. So, the first question is, in terms of the
false positive, if you can just say something about how do you actually deal
with the false positive data. And number two, unfortunately the majority of
the population that are suffering from some of these, you know, health issues such as
[inaudible] that you mentioned in your talk, how do you really get the older
generation to get used to these wearables? And I’m just curious if there’s any initiative
about [inaudible] this older generation actually to get involved with this initiative?>>Yeah. So, that’s a great question. So, first of all, before you’re re-muted,
when you say false positive, what do you mean?>>I think when you are actually getting all
the data, you know, you’re getting the data from the wearables, in terms of if a person
is not wearing the wearable at the time–>>Ah, yes, yes, yes, I got you.>>You know, things of that sort.>>Yeah. Yeah, so that’s a good question,
getting into the weeds a little bit. But what we use as our kind of, you know, in
accelerometer and actigraph language we’d call that non-wear time or apparent
non-wear time versus real non-wear time. What we do is actually we use, because the
heart rate monitor is, well, it’s not impossible to fool, but most likely if you’re
getting a heart rate and step data, then we think that that’s actually real
wear time, so that’s how we’ve kind of used to crudely say this is true wear time. And then in terms of getting kind of
older populations to wear these devices, so we don’t necessarily need everyone
to wear, for instance, a Fitbit, there’s many other types of approaches. When it comes to, again, I’m not
necessarily studying behavior change but I’m more studying observation of behavior. There’s plenty of devices for older
populations that are accelerometer-based, I know there are devices that measure falls,
for instance, within the home environment. So it can be as simple as a necklace or a
bracelet that doesn’t need to be charged as often as a Fitbit, there are lots
of options out there for observation of behavior in older populations.>>Okay, we have a question from
Gabriel, your line is unmuted.>>Alright, thank you. Hey, Peter, how you doing? Thank you so much for your talk. You know, you mentioned– perhaps
this is reflective of my own naïvete about do you observe that a large– maybe
it’s reflected on the nurses’ health study, but you see that people are wearing the Fitbits
for many days straight without taking off at a time, and are people just
not concerned about the issue of charging at all of that battery?>>Yeah, so to clarify that, that statistic,
I think it was from a [inaudible] study, I forget exactly the source of
that, I can look it up for you but that wasn’t the nurses’ health study
participants, that was a study trying to understand how long people would wear
their wearables once they bought them. And it showed that, you know, 42% of people
stopped wearing their devices after six months, which was framed as people buy these
devices and then they don’t stick with them, but that means a lot of people are
wearing them for more than six months. I use that as an example to say that,
you know, not that every participant in your study is going to wear the
device for more than six months, but that a consumer could wear the devices
for more than six months is promising because I don’t think there’s many actigraph
studies were we’re seeing participants who are willing to wear these devices for more
than a month at a time, and usually that’s with a lot of handholding and a
lot of phone calls to make sure that they’re persisting with our study protocol.>>Okay, we have two more questions.>>Okay, one more [inaudible], Ian.>>Thank you, it was a great
talk, very interesting data. In the nurses’ health study three, did you get
any metric of the three signups of the study, or once people were on the study of their
willingness to have their location tracked? Seven days at a time, four times a year?>>Yeah. So, that’s a really good
question, and so not when they sign up, but basically when we send out our invite,
this was something we were very concerned about and has been kind of paramount for us
is preserving our participants’ data and making sure that they understood
exactly what they were sharing with us, and we anticipated that very few people would
actually want to share their data with us, but we’ve been really surprised that when
we send out, again this is a sub0study, so when we sent out an invite to
our participants to say we’re going to measure your location, we’re going to
give you a Fitbit and measure your behavior, you know, we still get about a third of
participants who sign up for the study, who go through consenting and are eligible, so
right now the study is only made for iPhone, so based on that denominator of people who might
not have an iPhone, who may not be eligible for our study, we still get over 30% of people
who go through the whole consent process, that’s a pretty high participation
rate I think for a study of this type. So, we’ve found that people really are not
that concerned with sharing their spatial data, which I think we were surprised by, but I
guess when you consider all the apps out there that do gather your location information
for profit, people are maybe used to sharing these types of data, but
that doesn’t mean that we’re going to be any more lax with our use of their data.>>We have a few more questions on the Webex.>>So, one more question from Gabriel.>>Hi, Peter. My second question is in regards to what
you mentioned about the proprietor algorithm from the Fitbit software, it was a
black box issue, and how it’s possible that changing the software that
internal algorithm could change. How does your team, or have you thought about
how would you address that potential issue of validity as use of Fitbits don’t permit
you to evaluate how that data is generated?>>Yeah, I think one of the things
that is promising is it does seem like Fitbit is starting to come around and
will potentially provide a more open platform for their raw data in the future,
that’s kind of what we’re hoping for. But at the same time, a lot
of these data, you know, maybe we don’t necessarily
care about the absolute value. Let me see if I can go forward to some of out– for us when we’re looking at an analysis
like this where we’re categorizing behavior into four activity levels, perhaps we don’t
necessarily care about the absolute value in these types of analyses, so we don’t
think it’s a huge concern for us right now. Now, that said, I think it would always be
better to have the raw data and to be able to harmonize across different
studies that use different devices, so I am hoping that Fitbit
goes that route eventually, so that’s kind of the optimistic
perspective right now.>>Okay, we have a question
from Derga [phonetic]. Your line is unmuted.>>Oh hi, thank you very much. It was a great talk. My question is more about the participants. Do you take any health factors into consideration while picking the
participants or even analyzing their data? For example, heart rate might depend on different health conditions
the participants might have. So, do you segregate and create
cohorts, or how do you deal with that kind of information in [inaudible]?>>Yeah, so in this pilot we’ve cast a
fairly wide net and allowed participants in the study regardless of health conditions. The only thing that we exclude participants
on is whether or not they have an iPhone, just because our app developers only
developed our app for the iPhone right now, but that will change in the
future, this is again a pilot. And second, we actually exclude
participants who have a sleep disorder, and that’s because our Fitbit
devices are known to be, and actually most accelerometer-based devices
to measure sleep are known to be less valid in participants with sleep disorders. So, for instance if you have insomnia
and you’re perfectly still in bed, it may think that you’re
asleep when you’re really not. So, those are the exclusions we’ve made so far. Down the road, fi we scale this up I don’t
think we would exclude more participants, I think we probably would gather more
data, you know, we do have the information on health outcomes and other health
behaviors because they’re part of the nurses’ health study cohort, but
we really do want to cast a wide net and see if we can gather information on all
sorts of different types of individuals.>>So we have a question from
Davin Vaugh Onheisen [phonetic]. Your line’s unmuted.>>Thank you. Peter, first of all, fantastic
and interesting presentation. I had just a couple questions out of curiosity. First and foremost was just, you
know, with such large volumes of data, how do you manage these different
data sets coming in from all these different avenues and facets? And maybe what challenges? And then also just another thing too was I
was reading recently how the National Weather Service is using sensors on phones,
like the barometric pressure sensors. Just curious if you considered
the weather as a spatial factor and potentially using these
iPhone sensors for this measure.>>Yeah, so, two great questions. So, first of all, managing the data
was a huge obstacle for us at first. So, what we’ve done is now, with
the help of a data scientist, we have created a non-structured
SQL database that uses MongoDB. So, basically we download our minute-level
data and basically upload it to this MongoDB, which allows us to kind of
query it as needed for analyses. So, this is a learning process for me, and
in fact we’re still working with kind of how to analyze the data for bigger populations,
and maybe in a year from now I’ll be able to give another talk on how that process works. We’re also, in terms of the Google Street
View data, working on a process for storage and analysis of that data within kind of
Harvard’s computing ecosystems and figuring out which one’s will be best for us
when we eventually go that route. So that’s, you know, two things that I’m still
learning and we’re still figuring things out, so I would love it if folks out there
could email me with ideas on how to process these data, how to store these data,
because I think it’s a fast-evolving field and actually goes to data
science and computer science more than my training in environmental epidemiology. So, I think that’s really important. And regarding weather, you know,
we haven’t necessarily thought about using the devices themselves to
measure weather, but we have a good number of researchers here who are focused
on health effects of climate change, and so temperature is something that we’re very
interested in and we have spatial data sets on temperature that we are attempting to
link to these data, so that’s another thing that we might be able to measure
as a predictor of health behaviors. And yeah, I mean, maybe the phones
themselves would be an interesting source of information down the road.>>We’re getting close to the top of the hour. Peter, I have one final quick question for you. You talked quite a bit about
participants sharing their data, what about participants’ access to data? Presumably, at least in the iPhone app, that
they would have access to their own data, do they also have access to
aggregate data from the study?>>Yeah, that’s a really good point. So, as far as our participants [inaudible]
goes, they really don’t see most of their data, they have a kind of a progress bar that tells
them how many days they’ve logged in terms of the study, but they don’t really
have much access to their data, and I think one of the reasons why
is kind of transferring the data back and forth would expose us to
a little bit more in terms of potentially losing data or breaches of data. But I think in terms of aggregate data we do, I
think probably the best thing we do is we send out newsletters and update them on all the
publications that come out from the study, but I think we could do a better job of
kind of communicating with our participants and giving them more access to their data,
and I welcome suggestions on how to do that with these types of data,
because as you can imagine, you know, this is protected health information,
this is kind of identifiable data when you have information on where a
person’s going, and if you have information about where they spent the night
you know that’s where they live. So, it’s certainly very sensitive data, and
so we are hesitant to kind of send it back and forth without regards to
the potential consequences.>>I know there’s a question, but I
apologize, we’re out of time right now, so perhaps you can follow
up directly with Dr. James. So, I hope you can all join us for our
next currently scheduled presentation, which is not until September 25th, when
doctors Janice Taube and Alex Szalay from Johns Hopkins will present illuminating
the tumor microenvironment using multiplex IF, astronomy accelerates pathology,
intriguing title. There may be additional speakers added in July,
we typically take August off in any event, but keep an eye on the website
and the Twitter feed. Once again, let’s thank Dr. James
for a very intriguing presentation and a very invigorating discussion. [ Applause ]>>I also have a sign-in
sheet here if anyone sees that opportunity, I’ll have it right up here.>>Thanks.