Data Quality Guide
This guide explains the data quality, volatility, and popularity metrics used in our Data Portal. These metrics help you understand the reliability and characteristics of each dataset.
Need to review or verify reports? Use the dedicated review page for filters, audit history, and fast actions.
Open review page
How the Review Page Works
- Use the search box to find a view by name, slug, or steward notes.
- Filter by score or review state to focus on unverified or manually adjusted reports.
- Verify one row at a time, or select several rows and bulk verify them together.
- Open a detail page when you need the full quality breakdown and usage context.
- Every verify or update action is written to the audit log, so you can review who changed what and when.
Quality Score
The Quality Score indicates the overall reliability of a dataset. It is assessed based on several automated metrics and can be manually verified by a data steward.
| Score | Description |
|---|---|
| BRONZE | Bronze - Automatically Assessed |
| SILVER | Silver - Verified by Administrator |
| GOLD | Gold - Guaranteed by Provider |
Data Volatility
Data Volatility describes how frequently the data in a dataset is expected to change.
| Level | Description |
|---|---|
| STATIC | Static - The data does not change. |
| INFREQUENT | Infrequent - The data changes rarely (e.g., daily). |
| STREAMING | Streaming - The data changes frequently (e.g., every few minutes). |
Popularity Metrics
Popularity metrics indicate how frequently a dataset is used by other applications and users. High usage can be an indicator of a dataset's utility and relevance.
- Total Requests (30 days): The total number of API requests made to this dataset in the last 30 days.
- Unique Users (30 days): The number of unique users (based on IP address) who have accessed this dataset in the last 30 days.
Automated Quality Metrics
These metrics are calculated automatically by our system on a regular basis.
- Availability: The percentage of time the data was available and successfully delivered compared to the expected update interval.
- Latency: The average delay between when an event occurs and when the data is available in the portal.
- Validity: The percentage of data points that fall within their expected range and format.