Parse pg_dump Functions and Views Into Their Own Files (GreenPlum / Postgres)

Here’s what we wanted: Each function from our GreenPlum (aka Postgres) database scripted out into it’s own file. I think after much hacking I’ve got something working that only requires a small amount of manual intervention.

For whatever reason PostgreSQL/GreenPlum doesn’t seem to have a native way to split objects into separate files. This functionality is super useful when moving the database under source control.  The open source python script pg_extractor seemed like the perfect option but for some reason when we executed it all the functions with a “RETURNS TABLE” were scripted out as “RETURNS SETOF”. We’ve seen this issue a number of times with a few different tools. (Let me know if you know the solution to this!).

We went with pg_dump next, which scripted the functions correctly (here’s the syntax used to get the objects from a schema named “api”: pg_dump -n api lytxbi > api.dmp ). Unfortunately, this placed all the functions in one file.

I started googling for solutions parsing the pg_dump file and didn’t find much. I did spot a post on stack overflow though that held some promise: awk – splitting a multiple FASTA file into separate files keeping their original names  Being a wannabe Linux user I started hacking away (using Cygwin on windows of course. I said wannabe Linux user right?).

After many hours I was able to tweak and tweak my awk script to get it to do a pretty decent job.  Here’s the script:

awk ‘/^CREATE FUNCTION / {OUT=substr($3,1,match($3,/\(/ )-1) “.sql”}; OUT {print >OUT}’ api.dmp

This script looks in the file api.dmp for a line which starts with “CREATE FUNCTION”. Since pg_dump has a standard format we know that the third word after CREATE FUNCTION will be the function name ($3 signifies the third word). Then substring the third word, stopping when an open parentheses is found and then concatenate “.sql”. Use that string as the file name, and split everything after from the row starting with CREATE FUNCTION into it’s own file.

Got a better way to do this? Let me know!

p.s. Here’s what I used to split the views into their own files:

awk ‘/CREATE VIEW / {OUT=substr($3,1) “.sql”}; OUT {print >OUT}’ api.dmp


Leave a Reply

Your email address will not be published.