Postgres synchronization solution P&L: -8 (≃ -680 CNY)
This is a challenging computer problem, data synchronization but I have an idea
I plan to use a inefficient rolling hash and sorting to solve the problem of synchronization. This is similar to rsync
我實現了一個散列,它採用以前的數據和當前列以及以前的散列來生成整個數據庫的散列。
這允許我們在編寫同步器部分時以最少的數據傳輸進行同步,該部分將重新散列所有自己的數據,然後檢索已排序數據的二進制搜索的散列。
I implemented a hash that takes previous data and current column and the previous hash to produce a hash of the entire database.
This allows us to synchronize with the minimum of data transmissions when I write the synchronizer part which shall rehash all its own data, then retrieve the hash of a binary search of the sorted data.
我對如何解決“獲勝”副本問題有一個想法。
有一個單獨的表,對每個列字段和行進行哈希處理併爲其提供一個版本。
這是比較的版本。
I have an idea on how to solve the "winning" copy problem.
Have a separate table that hashes every column field and row and gives it a version.
This is the version that is compared.
我的 vagrant 設置使用永久性磁盤並使用 ansible 來部署 cronjob 和同步腳本。它由 YAML 文件配置。我還安裝了 psycopg2,並找到了有關如何在 Postgres 中檢索數據庫中表的文檔。現在只需編寫同步算法即可。
我的問題是檢測哪一方是獲勝副本。
當一側更改數據時,應有不同的散列並且檢測到更改的行。這部分我明白了。
問題是檢測哪一方是最新的變化,哪一方應該獲勝。我可能需要介紹一個版本列。
如果我有一個最後更新的時間戳字段,我可以使用它。或版本列,但我明確試圖避免將新列引入架構。這意味着它要困難得多。
My vagrant setup uses persistent disks and uses ansible to deploy the cronjob and sync script. It is configured by YAML file. I've also installed psycopg2 and I found documentation on how to retrieve the tables in a database in Postgres. It's just a matter of writing the sync algorithm now.
My problem is detecting which side is the winning copy.
When one side changes the data there shall be a different hash and the changed rows are detected. This part I understand.
The problem is detecting which side is the latest change and which side should win. I might need to introduce a version column.
If I had a last updated timestamp field I could use that. Or a version column but I am expressly trying to avoid introducing new columns to the schema. It means it's a lot harder.