Skip to main content
Question

Clarification on pagination limits and dataset completeness when acquiring bulk datasets/snapshots through the Censys Data Downloads API

  • April 9, 2026
  • 1 reply
  • 12 views

Hello,

I am currently working with the get-files endpoint to fetch the list of files for a snapshot from the universal-internet-dataset-v2-ipv4 dataset, and I would like to clarify how to verify dataset completeness.

For context, although I’m posting this from a personal account, I’m using API credentials associated with an organization headed by my research group supervisor, who has research access to data granted to his account.

While retrieving the files for the snapshot universal-internet-dataset-v2-ipv4_20260317, I observed the following:

  • Pagination proceeded normally up to page 100

  • Each page returned exactly 100 file entries

  • In total, I obtained 10,000 files (indexed 0–9999) with no gaps

  • On page 100, the response did not include a nextPage token

  • Repeating the requests yields consistent results (same files), although the page tokens themselves change between runs

I performed the following validation steps:

  • Verified file size consistency against the expected size (sizeBytes) provided in the API response

  • Confirmed file index continuity (no missing files)

  • Re-requested the last pages multiple times to ensure consistency

However, I could not find in the documentation:

  • Whether there is a maximum number of pages or files per snapshot

  • Whether 10,000 files is an expected fixed partitioning scheme

  • Or how to definitively confirm that the dataset is complete and not truncated by possible pagination limits

My questions are:

  1. Does the absence of a nextPage token reliably indicate that all files for the snapshot have been listed?

  2. Is there a known or fixed number of files per dataset snapshot (e.g., 10,000 files)?

  3. Is there any recommended way to verify completeness of a downloaded dataset snapshot?

  4. Are there any pagination limits (e.g., max pages or result window) that could cause incomplete enumeration without explicit indication?

I am also currently downloading another dataset snapshot (hosts-ipv4 from the same date) to compare behavior.

Any clarification would be greatly appreciated.

Thank you in advance!

Regards, 

Bernardo.

1 reply

MattK_Censys
Forum|alt.badge.img+2
  • Censys Community Manager
  • April 10, 2026

Hi ​@BernardoPR . I checked with the team and here are answers to your questions:

 

  1. Does the absence of a nextPage token reliably indicate that all files for the snapshot have been listed?
    • Yes.
  2. Is there a known or fixed number of files per dataset snapshot (e.g., 10,000 files)?
    • No, but it is consistently 10,000 files.
  3. Is there any recommended way to verify completeness of a downloaded dataset snapshot?
    • No, we do not have a hash of the entire dataset to verify completeness with.
  4. Are there any pagination limits (e.g., max pages or result window) that could cause incomplete enumeration without explicit indication?
    • There is a max page size, but exceeding it should not result in incomplete enumeration of files.

Let me know if this answers all of your questions.