Indifferent Shiba

August 19, 2016

Storing Assets in the Database

It seems common for web site builders to want to store binary objects associated with a site in the database that drives it. Based on my research, it also seems to be nearly universally cautioned against by folks with experience. Those folks make good arguments.

Regardless, I'd prefer to minimize my need to manage mutable shared disk storage. I don't see a way around storing audio files (to be streamed) on disk; pulling those files from a database as binary objects thwarts the web servers' ability to stream the data. But I can store small-ish images that accompany articles in the database without too much adverse performance impact.

To do that, I created a table to support this data structure:

data StoredFile = StoredFile {
  fileName     :: T.Text,
  fileData     :: ByteString,
  contentType  :: T.Text
} deriving Show

mkPersist defaultCodegenConfig [groundhog|
  - entity: StoredFile
    dbName: files
      - name: StoredFile
          - name: fileData
            dbName: file_data
          - name: fileName
            dbName: file_name
          - name: contentType
            dbName: content_type
          - name: filename_uniq
            fields: [fileName]

I then set this route to pass requests for /file to the database:

("/file/:name", getFile)

The getFile function parses the :name parameter and uses it to request the data from the database to present back to the client:

getFile :: AppHandler ()
getFile = do
  param <- getParam "name"
  f <- getFileFromDatabase $ B.unpack $ fromMaybe "" param
  modifyResponse $ setContentType $ B.pack $ T.unpack $ contentType $ f
  writeBS $ fileData $ f

getFileFromDatabase :: String -> AppHandler StoredFile
getFileFromDatabase n = do
  results <- runGH $ select $ (FileNameField ==. (T.pack n)) `limitTo` 1
  return $ head results

Right now, I'm just using a Python script to upload files to the database directly. I'll add a REST API to the publisher app at some point to handle this:

#!/usr/bin/env python

import psycopg2
import argparse

parser = argparse.ArgumentParser(description='Upload file to database.')
parser.add_argument('connstring', help='The database connection string.')
parser.add_argument('filepath', help='The file to be uploaded.')
parser.add_argument('content', help='The content-type of the file.')
parser.add_argument('filename', help='The name of the file as it will exist to be retrieved from the database.')

args = parser.parse_args()

conn = psycopg2.connect(args.connstring)

curs = conn.cursor()

f = open(args.filepath,'rb')

binary = psycopg2.Binary(filedata)

curs.execute("insert into files(id, file_name, file_data, content_type) VALUES (DEFAULT,%s,%s,%s) RETURNING id", (args.filename, binary, args.content))


On my site, the result of all this is a marked decrease in performance. But that's primarily because my database lives at my house and is accessed by the app servers in Azure via an IPSec VPN connection limited by my wireless ISP to 2Mbps upstream (one of the joys of living in the country - at least I get 15Mbps down even if I pay enough for it to get multiple Gbps in the city).

But the upside is that I can post new articles with images without redeploying my app server (a tedious process on the Basic_A0 systems I'm paying for at Azure).

Of course, I have no need for an image in this post. So here's a gratuitous picture of one of my dogs, Bella.