Page 1 of 2
Very large postgres database
Posted:
Tue Oct 30, 2012 3:25 pm
by ingvar
Hello,
I recently noticed that my Omero postgres database is growing quite fast, approaching 10 GB. Three tables account for 3.3 GB: event, eventlog, and session. I can not see where the other 6.5 GB are, is there some internal backup copy. The rest of the tables are all 40kB or less, so only add up to 1 MB or so.
Our systems group likes to test that my service is alive every 3 seconds or so, which more or less adds up to the 6 million rows in the event table since mid-June that I have.
Is it safe to prune these tables, e.g, remove all rows that are more than a week old? Even better, is there a way to configure this so that these tables are pruned automatically?
I am still on my hybrid 4.3.3 version, 2nd attempt to migrate did not go to well.
Kind Regards,
Ingvar
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 1:01 pm
by jmoore
ingvar wrote:Hello,
Hi Ingvar,
I recently noticed that my Omero postgres database is growing quite fast, approaching 10 GB. Three tables account for 3.3 GB: event, eventlog, and session. I can not see where the other 6.5 GB are, is there some internal backup copy.
No, there shouldn't be anything OMERO-specific that's taking up space. Can you send me the output of:
- Code: Select all
SELECT
table_name,
pg_size_pretty(table_size) AS table_size,
pg_size_pretty(indexes_size) AS indexes_size,
pg_size_pretty(total_size) AS total_size
FROM (
SELECT
table_name,
pg_table_size(table_name) AS table_size,
pg_indexes_size(table_name) AS indexes_size,
pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
) AS all_tables WHERE table_name like '%public%'
ORDER BY total_size DESC
) AS pretty_sizes;
See
https://wiki.postgresql.org/wiki/Disk_Usage for more information. You may need vacuuming or similar.
The rest of the tables are all 40kB or less, so only add up to 1 MB or so.
Our systems group likes to test that my service is alive every 3 seconds or so,
This sounds extreme....
which more or less adds up to the 6 million rows in the event table since mid-June that I have.
Is it safe to prune these tables, e.g, remove all rows that are more than a week old?
Removing (all) EventLogs is always safe, assuming you don't want them for auditing purposes. You may be hard-pressed to remove the sessions and events unless you use a more elaborate query.
Even better, is there a way to configure this so that these tables are pruned automatically?
I am still on my hybrid 4.3.3 version, 2nd attempt to migrate did not go to well.
Not at the moment, no.
Kind Regards,
Ingvar
Cheers,
~Josh
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 2:13 pm
by ingvar
Hi Josh,
I failed to do your query, partially due to that I am on postgres 8.4, and you used some 9.x functions, but even after changing to functions available in 8.4 I got some error.
Is there an allowed extension format for raw text files, I get a not allowed message for .txt, .dat, and no extension when I tried to upload a file with the output from \dtiv+
Most of the space that was unaccounted for is in the indecies for event, eventLog, and session.
I tried vacuumdb earlier, that had no noticable effect, but is likely to be needed after I delete old rows from the event, eventLofg, and session tables.
Completely agree with the exessiveness of the checks, but all I got was a "company policy" reply when I queried the wisdom of this.
Anyway the highlights from the output is:
public | event | table | omero | | 1675 MB |
public | event_external_id_key | index | omero | event | 512 MB |
public | event_pkey | index | omero | event | 408 MB |
public | eventlog | table | omero | | 558 MB |
public | eventlog_action | index | omero | eventlog | 171 MB |
public | eventlog_entityid | index | omero | eventlog | 135 MB |
public | eventlog_entitytype | index | omero | eventlog | 291 MB |
public | eventlog_external_id_key | index | omero | eventlog | 171 MB |
public | eventlog_pkey | index | omero | eventlog | 135 MB |
public | session | table | omero | | 1171 MB |
public | session_external_id_key | index | omero | session | 209 MB |
public | session_pkey | index | omero | session | 216 MB |
public | session_uuid_key | index | omero | session | 490 MB |
Cheers,
Ingvar
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 2:21 pm
by jmoore
ingvar wrote:I failed to do your query, partially due to that I am postgres 8.4, and you used some 9.x functions, but even after changing to functions avaiable in 8.4 I got some error.
Ah, ok. Sorry about that.
Most of the space that was unaccounted for is in the indecies for event, eventLog, and session.
Ok, so nothing's missing. Good to know.
I tried vacuumdb earlier, that had no noticable effect, but is likely to be needed after I delete the tables.
Agreed.
Completely agree with the exessiveness of the checks, but all I got was a "company policy" reply when I queried the wisdom of this.
What check are you having them do?
Cheers,
~Josh
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 3:04 pm
by ingvar
All they do is to access a page in Omero.web, but why that is needed every 3 seconds for each server + once more on the load balancer is beyond me. Real user accesses are orders of magnitude less frequent at the moment.
After removing the excess rows, I assume that I will need to reindex. Do I reindex before vacuuming or the other way around.
Cheers,
Ingvar
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 3:44 pm
by jmoore
VACCUUM ANALYZE and then optionally REINDEX seems to be the general suggestion, though I'm not sure the order is important.
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 4:01 pm
by ingvar
Deleting the EventLog went OK, but event and Session failed violating foreign key constraints:
EMDBslice=# delete from event * where time < '2012-07-01';
ERROR: update or delete on table "event" violates foreign key constraint "fknamespace_creation_id_event" on table "namespace"
DETAIL: Key (id)=(1) is still referenced from table "namespace".
EMDBslice=# delete from session * where defaulteventtype='User' and started < '2012-07-01';
ERROR: update or delete on table "session" violates foreign key constraint "fkevent_session_session" on table "event"
DETAIL: Key (id)=(105) is still referenced from table "event".
Any idea of a way out.
Ingvar
Re: Very large postgres database
Posted:
Thu Nov 01, 2012 4:13 pm
by jmoore
That definitely won't work. You'll have to identify the sessions/events that have no other objects linking to them and delete those. If you were using a particular user or similar, you could search for that.
Re: Very large postgres database
Posted:
Fri Nov 02, 2012 3:45 pm
by ingvar
Hi Josh,
Deleting from the event table is hard work. If I try to delete to much I get a timeout. I ended up using statements like:
DELETE FROM event * WHERE id > 2600000 and id < 2800000;
these take 10-15 min each, so looking at 18 h or so in total.
If any rows in the section has a foreign key restraints, I bracket the delete statement further.
One of my collegues prepared a statement to consider all foreign keys in into to event table, that statement is quite large, about 150 lines long. I will try that when I managed to remove the bulk of the rows.
I guess that in theory it might be possible to set up the database to do cascading deletes, but there might by bad side effects.
Cheers,
Ingvar
Re: Very large postgres database
Posted:
Sat Nov 03, 2012 8:22 pm
by jmoore
ingvar wrote:Deleting from the event table is hard work. If I try to delete to much I get a timeout. I ended up using statements like:
DELETE FROM event * WHERE id > 2600000 and id < 2800000;
these take 10-15 min each, so looking at 18 h or so in total.
Adding indexes on the event columns would likely speed this up. Most of the time is likely spent checking the FK constraints.
I guess that in theory it might be possible to set up the database to do cascading deletes, but there might by bad side effects.
Cascading deletes on Events would delete your entire DB.
~Josh