Public archive for Delving Bitcoin

Posted by jamesob

Sep 5, 2023/17:42 UTC

Creating a public archive of posts from a site is proposed as a precaution against the potential loss of content and to enhance searchability by indexers like Bitcoinsearch. There are three main methods discussed for achieving this: scraping, utilizing an API, and database dumping.

Scraping involves using tools such as Selenium or Wget to manually back up the site by crawling through its pages. This method is not favored due to the complexity of the site's JavaScript, which dynamically loads content, making scraping efforts both complicated and fragile. However, it is noted that this approach does not require special permissions, allowing anyone to undertake the task.

The use of the Discourse API is presented as a viable alternative. This method could be implemented either through periodic full crawls or on a continuous incremental basis. The feasibility of this approach hinges on the API providing comprehensive access to necessary data and the availability of someone to manage API credentials. It is suggested that the list posts endpoint might offer sufficient functionality for this purpose.

Database dumping is another considered option, involving obtaining SQL dumps of the database and running sanitization/export scripts periodically. This method is seen as potentially the most thorough, given its direct access to the database and presumed stability of the database schema. Its major limitation is that only administrators can perform these actions, restricting its accessibility.

The final recommendation leans towards using the API or database dumping methods to create a public archive. It is suggested that the results be shared on a git repository, ideally in multiple locations, to ensure public accessibility and redundancy. This strategy aims to safeguard the content against loss and make it readily accessible for indexing and searching purposes.

Link to Raw Post
Bitcoin Logo

TLDR

Join Our Newsletter

We’ll email you summaries of the latest discussions from high signal bitcoin sources, like bitcoin-dev, lightning-dev, and Delving Bitcoin.

Explore all Products

ChatBTC imageBitcoin searchBitcoin TranscriptsSaving SatoshiDecoding BitcoinWarnet
Built with 🧡 by the Bitcoin Dev Project
View our public visitor count

We'd love to hear your feedback on this project.

Give Feedback