10 KiB
Database
The Winamp Skin Museum, and related projects (Twitter bot, Discord bot), are powered by an SQLite database which is not included in this repository, but a copy can be obtained by reaching out directly.
For reference, the tables/colums are as follows:
algolia_field_updates
The search feature on skins.webamp.org is powered by Algolia third-party search. Data about each skin is pushed up to Algolia as it changes. In order to make re-indexing (pushing) efficient, in the case that we want to invalidate some values, here we track each time we update a field in Algolia.
idA unique ID for this updateupdate_timestampThe time at which this field was pushed to AlgoliafieldThe field in the search index that was updatedskin_md5The md5 hash of the skin (see theskinstable)
instagram_posts
For a while we were posting skins to Instagram trying to replicate the success of the Twitter bot. It didn't catch on, so we stopped. However, we have an index of all the posts that were made.
idA unique ID for this post (local to this database)skin_md5The md5 hash of the skin that was posted (see theskinstable)post_idThe Instagram ID of this posturlThe URL of this Instagram post
screenshot_updates
The Winamp Skin Museum's main feature is uniform screenshots of each classic Winamp skin. These screenshots are generated using a script that Puppeteer and webamp.org. If we make changes to that script, or fix bugs in Webamp, we may want to retake these screenshots and reupload them to S3. Similarly, we may want to track skins which fail to render or otherwise work in the screenshot script. This table records each time we retake the screenshots so that we can inteligently decide if we want to retake the screenshot at any given time.
idA unique ID for this updateupdate_timestampWhen the screenshot was takenskin_md5The md5 hash of the skin that was updated (see theskinstable)success(bool) Was this screenshot/update successful?error_message(string) Error message encountered when taking the sceenshot
archive_files
Each skin is actually a zip file. As part of the database, we examine the contents of each zip archvive and record metadata about the inner files. Each row in this table represents a file found within a Winamp skin's zip archive.
idA unique ID for this fileskin_md5The md5 hash of the skin that this file was found within (see theskinstable)file_nameThe file path of this file within the archivefile_md5The md5 hash of the file after being extracted from the zipfile_dateThe date that the file was created (according to the zip metadata)Deprecated, see theuncompressed_sizeThe size of the file, afterbeing decompressedfile_infotable.Deprecated, see thetext_contentIf the file is a text file, this column contains that textfile_infotable.is_directory(bool) Is this file a directory?
key_value
For some cron jobs and async tasks we want to track arbitrary state and efficiency is not a concern. For these quick things we have a key/value store where the value is often a json blob, and we just overwrite the whole thing each time.
key - I'll give you one guess...
value - I think you can see where I'm going here
skin_reviews
Before a skin can be tweeted by the Twitter bot, it must firt be approved by someone in the Discord server. This helps keep the quality bar somewhat high (fewer low effort skins getting shared) and also helps us avoid sharing NSFW skins on Twitter.
Additionally, skins marked as NSFW are blurred and down-ranked on the Skin Musuem.
These are all human reviews. Note that the bot will create both a NSFW and REJECTED review when a skin is marked as NSFW.
idA unique ID for this reviewskin_md5The md5 hash of the skin being reviewed (see theskinstable)reviewOne ofREJECTED,APPROVED,NSFWreviewer(Often missing) the user who did the review
file_info
Metadata about files extracted from zip files. Because many skins contain the same files, or some files are duplicated in many skins, we normalize them here so that we only have one row for each file, no matter how many skins it appears within.
idA unique ID for this filefile_md5The md5 hash of the file after being extracted from the zip (see thearchive_filestable)file_dateThe date that the file was created (according to the zip metadata)size_in_bytesThe size of the file, afterbeing decompressedtext_contentIf the file is a text file, this column contains that text
skin_uploads
When a user attempts to upload a file at the Museum, they first request an upload URL so that they can upload the file directly to S3. Then, once they have uploaded it, they notify our server, and we kick off a job to download the skin from S3 and screenshot/scrape it as we have capacity. This approach lets us scale to an unlimited number of uploads during a spike in traffic, and we will process them at our leisure.
The status of an in-progress upload is tracked in this table, and it is used by the server as a task queue to track skins that need to be processed.
First a user requests a URL for a skin (based on its md5) (URL_REQUESTED) once they've uploaded to S3, they notify us (UPLOAD_REPORTED), finally the file is processed and it ends up as either ERRORED or ARCHIVED (success!)
idA unique ID for this upload attempt. This is used in the S3 filename that the user is given permission to create.skin_md5The md5 of this uploaded file. Note that not all skins here will end up in theskinstable. Either due to uploads not completing, or processing error, or the file is not actually a skin.statusWhere in the pipeline is this upload:ERRORED,UPLOAD_REPORTED,URL_REQUESTED,ARCHIVED.filenameThe filename that the user had for the file when they uploaded it (files on S3 file name is based onid, so we need the filename here.
files
Information about skin files that we have injested. Since skins are indexed by their md5 content, we may have encounted the same skin file under multiple filenames. This table shows all the filenames we've encountered for each skin, and (in some cases) where we got the file.
idA unique ID for this filefile_pathThe file path (directory and name) of the skin fileskin_md5The md5 hash of the file (see theskinstable)source_attributionWhere we found this file (if known)
skins
Information about a given Winamp skin. Each item in the Winamp Skin Museum corresponds to a row in this table with a skin_type of 1 (classic).
idA unique ID for the skin (not really used, though it should be. Instead every other table references skins bymd5)md5The md5 hash of the skin file. Most other tables reference skins by this value. Usually with a column namedskin_md5We should probably fix that and useidskin_typeOne of1(classic),2(modern),3(pack),4(invalid)emailsA space-separated list of emails extracted from the skin's text filesreadme_textUsing a herusitic, we identify files in the skin archive that are likely to be readme or readme-like files and index them here. This should probably be done dynamically at query time and we should find this value in thefile_infotable.
ia_items
Each skin in the Museum should be persisted to the Internet Archive for preservation. This is done daily and we keep a local cache of what information the Internet Archive has about each skin.
idA unique ID for this internet archive item (local to this database)skin_md5The md5 hash of the skin (see theskinstable)identifierThe unique identifier used by the archive for this item (this is used in the URL of the item and for API queries)metadataA JSON blob contanining the Internet Archive's metdata about the itemmetadata_timestampThe last time we scraped the metadata from their API (I think)
refreshes
Much of the data in the database is derived from the skin archives themselves. However, we don't have all the skin files locally, they are in S3. So, we only periodically download them and re-extract all the data/screenshots. This table records each time we do this. This way we can know which skins need to be refreshed.
idA unique ID for this refreshskin_md5The md5 hash of the skin (see theskinstable)errorAny error we encountered during the refreshtimestampWhen we performed the refresh
tweets
The Twitter bot @winampskins tweets. Likes and retweets are scraped nightly, but can only read the most recent tweets. Like/retweets on older tweets are not seen by us, so the numbers represent a lower bound. Additionally, the Twitter API only lets you go back so far, so we may be missing some Tweets, since we didn't index these from the very begininig.
Finally, not all tweets that the bot tweets are scritly skins. Some are manual tweets or retweets. So, not every tweet will have a skin_md5.
idA unique ID for this tweet (local to this database)likesThe number of likes the tweet gotretweetsThe number of retweets the tweet gotskin_md5The md5 hash of the skin that was tweeted (see theskinstable) Note Not all tweets reference a skintweet_idThe ID for this tweet, as assigned by Twitter. This can be used to construct the tweet URL
knex_migrations
Metadata about migrations that have been run on the database. Used for making database changes
knex_migrations_lock
Used to ensure migrations are applied correctly.
museum_sort_overrides
Used for making editorial decisions about how individual skins show up in the main scroll of the Winamp Skin Museum. Used for boosting the default skins and hiding aparent duplicates. In reality there are many many near or actual dupes, but we manually cull duplicates that appear in the first few pages.
idA unique ID for this override (local to this database)skin_md5The md5 hash of the skin that is being overriddenscoreA score for how highly rated this skin should be. Negative numbers mean the skin should be hidden.commentExplains why the skin was ranked this way