Where Our Data Comes From

Every researcher profile in this database is built from publicly available academic metadata. This page documents each source, how often we refresh it, the validation we apply, and where to flag corrections.

Data Sources

Profiles draw on multiple independent sources. No source is used in isolation; cross- referencing across sources lets us flag inconsistencies and assign a verification tier.

OpenAlex

Primary source. Publications, citation counts, h-index, research topic labels, and institutional affiliations. openalex.org

ORCID

Researcher identity verification and employment history. orcid.org

NIH RePORTER

NIH grants: project titles, award amounts, principal investigators, funding institute. reporter.nih.gov

NSF Awards

NSF grants: award titles, amounts, directorates, programs, investigator roles. nsf.gov/awardsearch

USAspending.gov

Non-NIH/NSF federal grants (USDA, DOD, DOE, etc.). Some awards are attributed via topic matching when investigator information is incomplete; these are labeled "Topic-matched" on profile pages. usaspending.gov

UAMS TRI Profiles

For UAMS researchers, supplements OpenAlex with biographical details, clinical specialties, and profile images from the UAMS Translational Research Institute directory.

ARA Academy directory

Designations and biographies for ARA Fellows, sourced from the Arkansas Research Alliance Academy member directory.

UADA personnel directory

For the University of Arkansas Division of Agriculture, the public personnel directory is used to confirm appointments and disambiguate dual-affiliation researchers.

Refresh Frequency

Data is refreshed on a rolling schedule. Different stages of the pipeline run at different cadences:

  • Daily (lightweight refresh). Validates ORCID and publications, recomputes scores and metric snapshots, refreshes grants, and re-indexes for search. Runs Sunday through Friday at 07:00 UTC.
  • Weekly (full harvest). Pulls new and updated researcher records from OpenAlex for every covered Arkansas institution, then runs the full validation pipeline. Saturdays at 07:00 UTC.
  • Monthly (UADA personnel scrape). Refreshes the UA Division of Agriculture personnel directory used to disambiguate UADA appointments. First of each month at 05:00 UTC.
  • Quarterly (deep re-validation and topic re-profiling). Regenerates AI narratives, re-computes topic profiles and topic narratives, recomputes the collaboration network and similarity index, and performs a full re-import to the search index.
  • Profile claims and edits. Take effect immediately upon verification.

Validation Tiers

Each researcher record carries a verification tier reflecting how strongly we have corroborated their identity and Arkansas affiliation:

  • Tier 0 — Unverified. Initial harvest from OpenAlex only. Not shown by default in search results.
  • Tier 1 — Data-verified. Identity and Arkansas affiliation corroborated across multiple sources (e.g., OpenAlex + ORCID + publication affiliations). The bulk of profiles fall here.
  • Tier 2 — Self-claimed. The researcher has verified ownership of the profile via institutional email and may have edited fields directly.
  • Tier 3 — Institution-verified. Reserved for profiles confirmed by an institutional administrator.

Search results also classify profiles as Confirmed (strong evidence, recent publishing activity) or Likely Affiliated (moderate evidence, manual review may be warranted).

AI-Generated Narrative Summaries

Each profile includes a plain-language narrative summary describing the researcher's focus areas and impact. These narratives are generated by Google's Gemini 2.5 Flash Lite model, grounded in the researcher's verified publications, grants, and metrics. They are clearly labeled and regenerated automatically when the underlying data changes (we hash the input fields and skip regeneration when the hash is unchanged).

Topic-level narratives describing each research area at the field level are generated the same way and refreshed on the quarterly cadence.

Topic Attribution

Research topic labels on each profile are derived from OpenAlex's topic taxonomy. We apply the following quality filters:

  • Topics at depth level 2 or deeper (more specific than top-level domains).
  • OpenAlex relevance score of at least 0.4 for the researcher.
  • A maximum of 10 topics per researcher, ranked by relevance.

Known Limitations

  • Arkansas-affiliated researchers only. Coverage is limited to researchers whose OpenAlex institutional affiliation is mapped to an Arkansas institution. Researchers who recently moved into or out of Arkansas may take a refresh cycle to appear or update.
  • Limited activity. Profiles with no publications in the last seven years are marked "Limited activity" and de-emphasized in default views.
  • Heuristic grant attribution. A subset of USAspending grants is attributed to researchers via topic and institution match rather than direct investigator naming. These are labeled "Topic-matched" so you can weigh them appropriately.
  • OpenAlex disambiguation. OpenAlex uses automated author disambiguation; we mitigate misattribution through ORCID cross-referencing and an in-house deduplication pipeline, but occasional errors can occur.
  • AI narrative caveats. Generated summaries are constrained to verified data, but they are not a substitute for reading a researcher's publications directly. All AI content is labeled.
  • Grant coverage scope. State, foundation, and industry-sponsored awards are not currently captured.

Spot Something Wrong?

Every profile page has a "Report an Error" button you can use to flag inaccurate data for review. For broader corrections, removals, or questions about a specific data point, email us directly.

Questions About Our Data

For questions about sources, methodology, or how a specific profile was generated, contact the Arkansas Research Alliance.

info@aralliance.org

(501) 450-7818

Last updated: 2026-05-13