How we analyzed water quality
Our analysis included more than four million sampling points measuring
pollution levels in thousands of waterways using
data from
the Department of Environmental Protection. We examined whether
nitrogen, phosphorus and nitrate-nitrite levels were increasing,
decreasing or staying the same over 25 years.
Florida is divided into thousands of water segments that vary in size.
For instance, the main parts of Tampa Bay are made up of 12 segments,
while Lake Tarpon is just one. We considered every individual segment
as a waterway and prioritized looking at waters already considered
polluted by the Department of Environmental Protection. (For ease and
understanding, we also use the word waterway to refer to entire bodies
of water, like the Lagoon, that are composed of several water
segments.)
We used the
Regional Kendall test
— a variation of a popular statistical test for identifying
environmental trends that accounts for differences in seasonal
measurements — to assess the trajectory of water pollution across
Florida. The Regional Kendall test was developed by scientists at the
U.S. Geological Survey.
Fifteen experts in water quality vetted the statistical analysis and
provided guidance on our approach — including technical experts from
regional conservation groups, academics and former federal government
scientists.
The Regional Kendall test works by evaluating every unique water
monitoring station and season combination by grouping them into
what’s known as a “block.” Only measurements taken at the same
place, during the same season, are compared to each other.
Within each block, every measurement is compared to all later
measurements to see whether it is larger or smaller. In our
analysis, this meant the test considered whether nitrogen,
phosphorus and nitrate-nitrite levels were higher or lower than
previous measurements. If multiple measurements were taken at a
station during one season, a median was used. The test then counts
how many times later measurements are larger or smaller than
earlier ones to determine the direction — upward or downward —
pollutants are trending. This calculation determines the Kendall’s
S test statistic for each block.
Some blocks can have stronger test statistics, depending on the
consistency of sampling and the magnitude of changes in the data.
The overall trend for the water segment is calculated as the sum
of test statistics from each block. The result tells us the
direction of the trend.
For example, consider a waterway with 20 total blocks. If 16 have
strong negative test statistics for pollutants while four show
positive test statistics that do not outweigh the negative, the
overall trend for the waterway would be decreasing.
Blocking by both station and season helps take into account the
effect of seasons on water quality measurements. In Florida, where
torrential rains lead to more pollutants running off land and into
water bodies, time of year is important to consider.
Seasons were divided based on quarters: January to March, April to
June, July to September and October to December.
We did not adjust for flow. The Regional Kendall test has an
option to adjust for flow; however, it assumes the relationship
between flow and nutrient concentration is unchanging. To best
make the adjustment, we needed flow measurements to be taken at
the same time as every sample — a level of detail that is hard to
achieve. Statewide, there aren’t enough consistently maintained
flow gauges to provide that data, particularly in estuaries.
Factors like climate change and human engineering have also caused
the relationship between flow and concentration to
change over time.
Using flow data for the adjustment could distort results, since an
underlying assumption — that the flow and concentration
relationship is always consistent — may not be met in many places.
After adjusting for continuity, we had enough data to identify trends
in about half of waterways that are considered polluted.
We looked at waterways listed as impaired for nitrogen, phosphorus or
nitrate-nitrite and strong indicators of nutrient impairment: algal
mats, chlorophyll-a, macrophytes and dissolved oxygen. Waters impaired
by these indicators, but not explicitly for nitrogen, phosphorus or
nitrate-nitrite, were evaluated for trends in both nitrogen and
phosphorus.
We considered water bodies as polluted if they appeared under
categories 4a, 4b, 4d, 4e and 5 in the state’s
Impaired Waters Rule Database run 66.
We took steps to ensure that water quality measurements were
relatively consistent over a 25-year period. To run a trend test,
each nutrient and water body segment combination needed to meet
the following requirements:
At least 100 available samples taken between 1999 and 2024. This
requirement was derived from multiplying 25 years by four to
account for measurements across seasons. These samples needed to
span at least 17 years — about 70% — of the trend period. For
example, a segment with measurements between 2002 and 2020 would
be included.
At least 6 years of sampling to be taken at each station and
season “block.” Blocks that did not meet this criteria were
removed.
Measurements taken in at least half of the years in the overall
time period. This threshold allowed us to evaluate trends in
segments where samples were taken at irregular intervals. To
ensure that segments did not have significant sampling gaps, we
removed any that had more than five consecutive years of data
missing.
For some waterways, the Florida Department of Environmental
Protection has set pollution reduction targets, called Total
Maximum Daily Loads. We wanted to examine pollution trends since
the goals were adopted. To analyze these waterways, our sampling
requirement was the number of years since the Total Maximum Daily
Load was adopted multiplied by four for the number of seasons in a
year. We considered each Total Maximum Daily Load adopted at least
a decade ago and ran the trend test from the year it was adopted
until the present.
To ensure serial correlation did not affect the results, we used
the Seasonal Kendall adjustment identified by
Hirsch and Slack
in 1984 and implemented in the
RKT R package. Results from the trend test did not differ significantly after
adjustment, so we reported the non-adjusted figures.
In each water segment, we filtered any sample results that were
more than three standard deviations above or below the mean to
eliminate outliers.
To identify directional trends and their magnitude, we considered
Kendall’s S test statistic, the regional Theil-Sen slope estimate
and the p-value
reported by the test. The S test statistic is the output of the test that tells us
the direction of the trend. The Theil-Sen slope tells us the
magnitude of the trends.
We also calculated the
percent change
in concentration, based on the Theil-Sen slope. We multiplied the
Theil-Sen slope by the number of years in the record and divided
that by the overall mean measurement for the water body.
Percent Change =
Theil-Sen Slope x Number of Years in Record
Overall Mean Concentration
x 100
If the S statistic was positive and the percent change was greater
than 5%, we considered the trend’s direction to be “increasing.”
If the S statistic was negative and the percent change was less
than -5%, the trend’s direction was “decreasing.” If the percent
change was between 5% and -5%, the trend’s direction was
“maintaining.”
When we found an “increasing” trend, we referred to the water body
as worsening or getting dirtier. When we found a “maintaining”
trend, we referred to pollution levels as not improving.
Other notes on data preparation:
Data with result qualifiers of “A,” “F,” “G,” “H,” “K,” “L,”
“N,” “O,” “T,” “V,” “Y,” or “?” — as described in DEP’s data
qualifier rules (Table 1, Data Qualifier Codes, in Rule 62-160.700, F.A.C.,
Quality Assurance) — were not used.
Data with a result qualifier of “I” and “U” were only used if a
corresponding method detection limit (MDL) was listed. If the
result was below the method detection limit, the value of the
method detection limit was used. Fewer than 1% of measurements
were under the method detection limit across impaired waters
with sufficient data.
Overall, we ran a trend test on every water body segment that met
our data standards. That totaled 1,477 waterways. Then, we
evaluated trend results in three groups:
766 waterways identified as impaired
17 additional impaired water bodies had enough data to run a
trend analysis overall but not to identify a trend for any
nutrient of impairment. We did not include these water
bodies in our final results.
694 waterways not identified as impaired
149 waterways with Total Maximum Daily Loads, also called
pollution budgets, which set target reductions for contaminants
We did not consider orthophosphate in this analysis because
no waterways are currently listed as impaired for
orthophosphate as of Impaired Waters Rule Database run 66.
There are multiple ways to analyze and interpret trends. The state has
performed
similar analyses
on a smaller subset of monitoring sites. In many places, the
Department of Environmental Protection hasn’t stated whether water
quality is getting better or worse because it uses a strict
interpretation of statistical significance.
Water quality experts and academic texts say such methods can lead
regulators to miss important signs about the health of a waterway.
“I believe there’s always a trend,” said Robert Hirsch, former
associate director of water at the U.S. Geological Survey who reviewed
our methods. “You want to know that there’s a non-trivial chance that
there is this bad outcome and that you might want to do something
about it.”
We instead used a framework that categorized trends based on
likelihood, a method recommended by Hirsch that is based on
statistical research. In this context, we use the term “likelihood” to
refer to an approximate probability that a trend is in the direction
reported by the test.
The categories used were “highly likely,” “very likely,” “likely” and
“about as likely as not.” Only waters that fell into the “likely”
bucket or higher were included when counting trends in our analysis.
In 2016, the American Statistical Association
released guidance
on p-values that avoids strict interpretation of p < 0.05. They
recommended that scientific conclusions and policy decisions not
be based solely on whether a p-value passes a specific threshold.
Top hydrologists and statisticians at the U.S. Geological Survey
also indicate in a textbook that considering practical
significance by using a
“strength of evidence”
approach could help water managers make more timely decisions
about pollution control.
The Florida Department of Environmental Protection performs
similar
trend analyses at certain continuously monitored sites. However,
many of these sites
report
“no trend”
because the p-value returned is greater than 0.05.
In an email, the Department of Environmental Protection said that
determining trends using p-values outside traditional
interpretations was “doubling down on conclusions that are not
supported by robust scientific standards.”
To derive meaningful and specific results from this trend
analysis, we started from the premise that there is
always a trend, a concept from research cited in a U.S. Geological Survey
textbook.
We determined the direction of the trend using Kendall’s S
statistic and calculated its magnitude using the percent change
derived from the Theil-Sen slope.
Then, we assessed p-values associated with the trends. The
Regional Kendall test returned a two-sided p-value because the
trend could either be increasing or decreasing. In traditional
hypothesis testing, this p-value indicates the probability of
observing a trend as extreme as the one found under a null
hypothesis that no trend exists.
But in practice,
statistical
studies
have shown that researchers can approximate the likelihood that
the identified trend direction is correct. This likelihood is
related to the two-sided p-value that is typically reported for
hypothesis tests such as the Regional Kendall test.
Using a likelihood approach gives an estimate of the probability
that the observed trend is not only real but also is in the
direction reported by the slope and the S statistic. Specifically,
we used the below formula, recommended by Hirsch and used by
scientists in a U.S. Geological Survey
report.
Probability that the trend identified is in the correct
direction ≈ 1 − (two-sided p-value / 2)
Note: This is an estimation, not a direct calculation, of formal
probability.
Finally, we categorized the approximate likelihood of the trend
being in the correct direction as follows:
Term
Likelihood of outcome
One-sided p-value range
Highly likely
95–100%
0.00 ≤ p < 0.05
Very likely
90–95%
0.05 ≤ p < 0.10
Likely
66–90%
0.10 ≤ p < 0.34
About as likely as not
50–66%
0.34 ≤ p ≤ 0.50
In calculating findings, we counted any trends with a “likely”
outcome or stronger.
We considered trends that fell into the “about as likely as not”
outcome as not having enough statistical strength to conclude a
trend, even if direction and magnitude were reported by the test.
We do not include these results in our story’s findings.
Once the trend results were calculated, we sorted them into buckets.
Trend category
Definition
Impaired waters
Waters with pollution budgets
Worsening
At least “likely” increasing on one or more nutrients of
impairment
369
59
Maintaining
At least “likely” maintaining on one nutrient of impairment but
not “likely” increasing on another
40
8
Improving
At least “likely” decreasing on all nutrients of impairment
259
56
Ambiguous
A water body is “about as likely as not” to have a trend in either
direction on at least one nutrient, and it is not at least
“likely” to be worsening or maintaining on another
98
26
Many water bodies are impaired for multiple nutrients. In these cases,
we evaluated each chemical. Because the waterways are already
considered polluted, for a segment to be considered improving, all
nutrients of impairment had to be clearly decreasing. If levels of one
chemical were clearly increasing or maintaining, we considered the
segment to be worsening or not getting better overall.
These categories contain a range of results. For example, a
“worsening” water may have increasing levels of nitrogen but
decreasing levels of phosphorus. Similarly, a water body could have
“maintaining” levels of phosphorus and decreasing levels of nitrogen
and be labeled as “maintaining” overall. If a water body was impaired
for multiple nutrients and decreasing on one but “about as likely as
not” to have a trend in either direction on another, we considered
that segment to fall in the “ambiguous” category.
Experts differed on whether waters with trends that were “about as
likely as not” could fall in the “maintaining” category. We chose to
take the more conservative approach. We broke out these cases and
assigned them to the “ambiguous” category. This way, we distinguish
between water bodies with consistent, high-confidence flat trends and
water bodies with scattered measurements that resulted in low
confidence of a trend direction.
How we assessed state progress toward pollution reduction goals
We identified 294 water bodies currently listed by the Department of
Environmental Protection as having Total Maximum Daily Load targets,
or pollution budgets. These waters are identified in the state’s
Impaired Waters Rule Database.
Separately, the department publishes
a map
identifying each water body with a Total Maximum Daily Load. State
regulators change water body boundaries and names over time, meaning
some of the segments cover different areas and have different IDs
today than they did when the targets were first developed. To overcome
these discrepancies, we matched the department’s map of water bodies
that have Total Maximum Daily Loads to a present-day map of segments
and boundaries.
Establishing when each Total Maximum Daily Load was adopted was
essential for the analysis. We searched the Florida Administrative
Code for these dates and compiled them in a spreadsheet. The year of
adoption then became the starting point for running the Regional
Kendall test on waterways with targets.
The Total Maximum Daily Loads
are established in lengthy, scientific reports. We combed through each
state report for nutrient-related problems, reading thousands of
pages.
Total Maximum Daily Loads
are the “heart” of Florida’s approach
to reducing pollution in impaired waterways, according to the
Department of Environmental Protection. But local governments and
business leaders can choose to pursue other avenues, including
“alternate restoration plans” and “reasonable assurance plans,” for
reducing contamination before state regulators adopt such targets.
We did not analyze the success of these alternate approaches and
focused only on Total Maximum Daily Loads as state pollution reduction
goals.
How we estimated chemical loads
The Florida Department of Environmental Protection publishes
pollution reduction plans
for areas around impaired waters. They include modeling results that
estimate the amount of nitrogen and phosphorus from land that could
enter rivers, lakes and bays.
We examined all of these plans and found that 24 contained
comprehensive data to tally an approximate total load across
waterways. When the information was available, we also tracked
sources the state listed as responsible for the pollution. The
pollution sources are identified in broad categories, such as farm
or urban fertilizer, and do not identify specific companies or
landowners. We entered the figures into a database by hand and
summed results for nitrogen and phosphorus.
This allowed us to derive a best-available estimate for how much
nitrogen and phosphorus Florida says could threaten impaired waters
each year.
There are some caveats. The modeling results are based on land use
data from different time periods. Some of the plans haven’t been
updated in years. Because Florida has continued to develop rapidly,
the estimates may not capture the complete pollution load today.
Some of the estimates also may not account for efforts that have
been made by the state, local governments and businesses to reduce
pollution. The estimates may also include sources beyond runoff,
such as discharges from wastewater treatment facilities, atmospheric
deposition and loading that the Department of Environmental
Protection says is due to natural causes.
Additionally, the Department of Environmental Protection has not
included the same information in all of its plans over the last two
decades. Some contain detailed breakdowns of pollution totals and
sources while others do not.
The sum, ultimately, is likely an undercount of total pollution
across Florida. It includes estimates only for land around waterways
where the state has implemented restoration plans that involve
nitrogen, phosphorus or nitrate-nitrite pollution. That accounts for
roughly 38% of the peninsula, according to our analysis.
How we looked at land changes
We relied on
a dataset from the U.S. Geological Survey
that detects land use changes over time using satellite imagery. The
most recent land use classifications were estimated in 2023.
The dataset uses pixels to represent land area. Each pixel is
classified
based on a type of land use, such as tree-covered land, wetlands or
high-density developed land, and covers a 900 square-meter area.
Because the classifications are based on imagery, they may not always
reflect exact conditions on the ground. For this analysis, any
classifications that were not explicitly listed under “cultivated” or
“developed” were considered natural, including designations that could
overlap with industrial properties, such as “evergreen forest” used
for timberland and
“barren land” used as phosphate mines. Some low-intensity housing areas and rural places may be classified
as development but contain natural land.
To identify changes across Florida, we calculated the number of pixels
that shifted from natural land classifications to any land use classes
related to cropland or development between 1985 and 2023. Then, we
converted the total pixel area to acres.