Posted by jamesob
Sep 5, 2023/17:42 UTC
Creating a public archive of posts from a site is proposed as a precaution against the potential loss of content and to enhance searchability by indexers like Bitcoinsearch. There are three main methods discussed for achieving this: scraping, utilizing an API, and database dumping.
Scraping involves using tools such as Selenium or Wget to manually back up the site by crawling through its pages. This method is not favored due to the complexity of the site's JavaScript, which dynamically loads content, making scraping efforts both complicated and fragile. However, it is noted that this approach does not require special permissions, allowing anyone to undertake the task.
The use of the Discourse API is presented as a viable alternative. This method could be implemented either through periodic full crawls or on a continuous incremental basis. The feasibility of this approach hinges on the API providing comprehensive access to necessary data and the availability of someone to manage API credentials. It is suggested that the list posts endpoint might offer sufficient functionality for this purpose.
Database dumping is another considered option, involving obtaining SQL dumps of the database and running sanitization/export scripts periodically. This method is seen as potentially the most thorough, given its direct access to the database and presumed stability of the database schema. Its major limitation is that only administrators can perform these actions, restricting its accessibility.
The final recommendation leans towards using the API or database dumping methods to create a public archive. It is suggested that the results be shared on a git repository, ideally in multiple locations, to ensure public accessibility and redundancy. This strategy aims to safeguard the content against loss and make it readily accessible for indexing and searching purposes.
TLDR
We’ll email you summaries of the latest discussions from high signal bitcoin sources, like bitcoin-dev, lightning-dev, and Delving Bitcoin.
We'd love to hear your feedback on this project.
Give Feedback