Scrubbing

Bushel recognizes that some systems may not be able to detect and trigger a push when an item is updated or deleted. This is particularly common in systems which outright remove entries for items rather than marking them as deleted. While this is not an ideal situation, the Bushel system is able to identify changed or deleted items.

This is accomplished by (re)pushing all data with updated_at timestamps between the current time and some point in the past. (For example, all data that has been updated within the last 7 days.) After all data for the range has been pushed, send a request to the same update-{item} API with delete_older_than set to the timestamp of when the re-push began and delete_created_since set to the beginning of the time range being re-pushed. (e.g. 7 days ago). The Router will then reconcile its dataset with the items that were re-pushed by deleting all items with updated_at timestamps greater than the delete_created_since timestamp and were not sent in the re-push (i.e. items that were received prior to the delete_older_than timestamp).

# For this example, assume a 'current time' of approximately 2023-05-11 00:05:00.

# After completing a re-push (that started at 2023-05-11 00:00:00) of all splits
# from 2023-05-06 to the current time, this request will instruct Bushel to delete
# all splits with updated_at timestamps more recent than delete_created_since but
# were _not_ included in the re-push
#
# Send to: https://router.translator.bushelops.com/api/v1/push
{
  "data": [
    {
      "update-splits": {
        "splits": [
            # Additional records may be sent in the same payload as the scrubbing request.
            # These will be processed prior to the scrubbing process.
        ],
        "version": "1.0.0",
        "delete_older_than": "2023-05-11T00:00:00Z",
        "delete_created_since": "2023-05-06T00:00:00Z"
      }
    }
  ]
}

If scrubbing is necessary for the integration, Bushel recommends setting up multiple scrubbing configurations with more-intensive running less frequently. For example, configure a scrubber that reconciles data from the last 24 hours to run every 30 minutes, and a scrubber for the past 90 days to run weekly. The specific configurations for each integration will be dependent upon usage patterns for the type of data, database capability, and dataset size.

Scrubbing Step by Step