PostgreSQL Database¶
As I use PostgreSQL databases a lot I will collect some operation tasks around them.
Warning
I try to keep up with the time and the SQL below won't work on all PostgreSQL versions. From time to time I will update it if changes happen. But mostly I stick to the current stable version used in the debian repositories.
Root access¶
As defined in the default configuration of PostgreSQL on Linux, a user called postgres is made and only this user may connect without password and has super admin rights to the entire PostgreSQL instance.
So to change the configuration or administrate it you have to be postgres user:
sudo -u postgres psql
For security reasons sudo -u may not generally be allowed but a setting like:
# setup once as root
echo "su -s /bin/bash -l postgres" >> /usr/local/sbin/go-user-postgres
chmod 755 /usr/local/sbin/go-user-postgres
echo "<user> ALL=NOPASSWD: /usr/local/sbin/go-user-postgres" >> /etc/sudoers.d/user
This allows a specific user to switch to postgres user but not to other users:
sudo /usr/local/sbin/go-user-postgres
And to make it easier for you to call it, add it in your users path:
echo "export PATH=$PATH:/usr/local/sbin" >> /home/user/.bashrc
source /home/user/.bashrc
# now you only need to call
sudo go-user-postgres
Configuration¶
To generally calculate how to configure PGTune can be used. But this is only a start point you should always check that the values suggested here are working on the specific scenario.
| Parameter | Default | Optimal | Description |
|---|---|---|---|
max_connections | 100 | 100 < x < 600 | This value should be as low as possible, because it will increase RAM usage. |
ssl | on | off | If you work within secure local networks it is not really necessary. |
shared_buffers | 128MB | 25%-30% of RAM | This is for a single cluster, for multiple clusters reduce this. |
work_mem | 4MB | 4 < x < 64 MB | This is the main performance point for complex queries and will be reserved for each connection. |
maintenance*work_mem | 64 | 64 < x <1024 MB | Set it to a max of 256MB, more won't give you really more performance. |
max_stack_depth | 2MB | 4MB | It has to be less than the ulimit -s setting. |
random_page_cost | 4.0 | 2.0 | Here you have to experiment with your CPU. |
cpu_tuple_cost | 0.01 | 0.005 | |
cpu_index_tuple_cost | 0.005 | 0.0025 | |
cpu_operator_cost | 0.0025 | 0.0015 | |
effective_cache_size | 4GB | 70-75% RAM | This is the maximum RAM for postgres, as it also needs the OS caching it should keep some space for the OS. |
logging_collector | off | on | To show slow queries in the log for later analysis. |
log_directory | log | log | Will use /var/log/postgresql/<major-ersion>/<cluster-name>/. |
log_filename | 'postgresql-%Y-%m-%d*%H%M%S.log' | 'postgresql-%d.log' | This will rotate daily and overwrite each day of the previous month. |
log_file_mode | 0600 | 0644 | Needed to give other users read access to the logs. |
log_truncate_on_rotation | off | on | |
log_rotation_age | 1d | 1d | |
log_min_duration_statement | -1 | 1000 | Log all queries with a minimum duration of 1000msec. |
Styleguide¶
To use a styleguide is generally a good idea and helps you and your colleagues to easily read SQL statements.
Database¶
The database is a real separation of data. Basically a user cannot join between them, each database has it's own users and schemas which an not be linked to another database without some special extensions like dblink. And better keep it this way, if you need to join data between tables put them in the same database, maybe in different schemas.
Schema¶
Schemas in postgres are used to organize objects like table, views and functions into logical groups. They make large datastructures more manageable. So use a schema for each logical unit.
Each database should have a main schema with the name of the database to store the core tables...
User¶
Every service or process should have it's own user to always know who is doing what. So if you look at the processes you will find the user and the IP it is comming from.
sys_*for systems and servicesproc_*for processes<human nickname>likeaschillingoperatorcan be used as the general owner of the database<schema>_adminas owner for individual schemas if not operator is used
You may also use a prefix for each group if you have a lot of services and processes in an SOA.
Warning
Setting rights directly to this users will tend to an unmanageable big structure. So better use groups to set the rights.
Groups¶
To easily set the right grants for each user make groups which are used like a role. To differentiate to the users prefix them with group_* and make two or three groups for each area, one only for reading, the other also for writing and maybe one for the owner which can alter something:
group_<name>_reader
group_<name>_writer
As the name you can use the area of tables which often is also the used schema.
See the next chapter for an example of how to create such a database
Now set the roles for such groups:
CREATE ROLE group_access_owner;
CREATE ROLE group_access_writer;
CREATE ROLE group_access_reader;
GRANT ALL ON TABLE access.user TO group_access_owner;
GRANT ALL ON TABLE access.user TO group_access_writer;
GRANT SELECT ON TABLE access.user TO group_access_reader;
Index¶
As each index need a name use i_<field> for a one column index and for multi column add the other field names.
Views¶
To distinct them from tables better use v_* as prefix.
Materialized Views¶
To distinct them from tables better use mv_* as prefix.
Setup Database¶
In the following example a default setup is shown which can be adjusted and extended.
-
Create a folder for the tablespace
sudo su postgres mkdir /var/lib/postgresql/tablespaces/13/main/tbs_mydbIf this database cluster also has some slaves this need to be done on all slaves, too.
-
Creating database owner
CREATE ROLE operator WITH SUPERUSER LOGIN INHERIT ENCRYPTED PASSWORD 'mypass'; -
Create tablespace
CREATE TABLESPACE tbs_mydb OWNER operator LOCATION '/var/lib/postgresql/tablespaces/13/main/tbs_mydb'; -
Creating Database
CREATE DATABASE mydb WITH OWNER operator TABLESPACE tbs_mydb; GRANT ALL PRIVILEGES ON DATABASE mydb TO operator;Attention
From now oän connect to database. If you are working on the command line using psql write
\c mydb. -
Creating groups and user
-- create group roles CREATE ROLE group_access_reader WITH INHERIT; CREATE ROLE group_access_writer WITH INHERIT; GRANT group_access_reader TO group_access_writer GRANTED BY operator; -- add service user CREATE ROLE sys_frontend WITH LOGIN INHERIT ENCRYPTED PASSWORD 'secret'; GRANT CONNECT ON DATABASE mydb TO sys_frontend; GRANT group_access_writer TO sys_frontend GRANTED BY operator;The
INHERITattribute gives the user the privileges of roles they are members of. -
Create schema
CREATE SCHEMA access AUTHORIZATION operator; GRANT ALL ON SCHEMA access TO operator; -- set defaults for all new tables in schema ALTER DEFAULT PRIVILEGES IN SCHEMA access GRANT ALL ON TABLES TO operator; ALTER DEFAULT PRIVILEGES IN SCHEMA access GRANT ALL ON TABLES TO group_access_writer; ALTER DEFAULT PRIVILEGES IN SCHEMA access GRANT SELECT ON TABLES TO group_access_reader;
And yeah, in principal that should it be!
Indexes¶
If you use an index decide if it will be better to use multiple one column indexes instead of multicolumn indexes. Multicolumn indexes tend to be often unused because they are not completely matching the query and therefore are unsuitable.
Extensions¶
postgres_fdw (dblink)¶
The postgres_fdw is the more transparent and standard conform solution as dblink, so better use this.
Warning
The use of foreign data wrapper to interlinking of databases is often a result of bad data structuring. Some exceptions are migrations.
To better organize everything don't mix local and foreign tables in the same schema, make an extra schema for the foreign tables with the name of the foreign database.
First install the extension:
CREATE EXTENSION postgres_fdw;
Then create a foreign server entry with the connection settings for the remote database.
CREATE SERVER fdw_second_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '<ip or domain>', port '<port>', dbname 'second_db');
A user mapping is needed as well to identify the role that will be used on the remote server:
CREATE USER MAPPING FOR <local_user>
SERVER fdw_second_db
OPTIONS (user '<foreign_user>', password '<password>');
Make a schema for the foreign data tables:
CREATE SCHEMA second_db;
Now it is possible to create a foreign table:
CREATE FOREIGN TABLE second_db.table_1 (
id integer NOT NULL,
data text
)
SERVER fdw_second_db
OPTIONS (schema_name '<schema>', table_name 'table_1');
It's essential that the data types and other properties of the columns declared in CREATE FOREIGN TABLE match the actual remote table. Column names must match as well, unless you attach column_name options to the individual columns to show how they are named in the remote table.
An easier solution would be:
IMPORT FOREIGN SCHEMA <remote_schema>
LIMIT TO ( table_1, table_2 )
FROM SERVER fdw_second_db
INTO second_db;
pipelinedb¶
pipelinedb is an extension for high performance aggregation of time series data.
Access¶
Listen address¶
Within /etc/postgresql/13/main/postgresql.conf check the listen_addresses. Only the IP addresses listet here can directly connect with the database.
Hint
If the database is only accessible on localhost you may call it through SSH tunneling.
General access (pg_hba.conf)¶
To allow somebody to access the server a matching entry in the file /etc/postgresql/11/main/pg_hba.conf is needed:
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all peer
# local connections with credentials
host all all 127.0.0.1/32 md5
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all peer
host replication all 127.0.0.1/32 md5
host replication all ::1/128 md5
# Users and systems which are allowed to use the database:
host all all 192.168.1.0/24 md5
Your file should look something like the above and you can change or add the entries. After something was changed, you have to reload the configuration in the server to take effect:
From the command line as postgres user:
/usr/bin/pg_ctl reload
Or using SQL:
SELECT pg_reload_conf();
Grants¶
To show which rights are set use:
SELECT table_catalog as database
,table_schema as schema
,table_name as table
,grantee
,string_agg(privilege_type, ', ' ORDER BY privilege_type) AS privileges
FROM information_schema.role_table_grants
WHERE grantee != 'postgres'
GROUP BY 1, 2, 3, 4;
Here are some common statement to grant access to a PostgreSQL user or group:
- Grant CONNECT to the database:
GRANT CONNECT ON DATABASE database_name TO username;
- Grant USAGE on schema:
GRANT USAGE ON SCHEMA schema_name TO username;
- Grant on all tables for DML statements: SELECT, INSERT, UPDATE, DELETE:
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA schema_name TO username;
- Grant all privileges on all tables in the schema:
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA schema_name TO username;
- Grant all privileges on all sequences in the schema:
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA schema_name TO username;
- Grant all privileges on the database:
GRANT ALL PRIVILEGES ON DATABASE database_name TO username;
- Grant permission to create database:
ALTER USER username CREATEDB;
- Make a user superuser:
ALTER USER myuser WITH SUPERUSER;
- Remove superuser status:
ALTER USER username WITH NOSUPERUSER;
Those statements above only affect the current existing tables. To apply to newly created tables, you need to use alter default. For example:
ALTER DEFAULT PRIVILEGES
FOR USER username
IN SCHEMA schema_name
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO username;
Usage¶
Cleanups¶
To cleanup old data you have different possibilities:
- truncate the table - the fastest way to free some space
- delete records - will make free space within for new records but not on disk
- move records to an archive file or database
If you want to do the last it's best to
- make an identical table, but without the indexes as *_history
- copy the records to be archived and delete them in the original table
- export the *_history table to file
- truncate *_history table
You can also run step 2 every day on do step 3 to 4 once a month.
Maintenance¶
A quick overview of what is going on can be seen using pg_top which is like the system top, but for the database. Install it in debian using: apt-get install pgtop.
Databases and Schemas¶
Get a list of all databases using \l in psql or the following SQL:
SELECT datname AS db, datconnlimit AS connlimit FROM pg_database WHERE datistemplate = false;
To also show the size of the database:
SELECT d.datname AS Name,
pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS SIZE,
(SELECT numbackends FROM pg_stat_database WHERE datname=d.datname) AS Backends
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
;
The schemas can be retrieved for each database with size (connect to the database first):
SELECT schema_name,
pg_size_pretty(sum(table_size)::bigint) AS size,
round((sum(table_size) / pg_database_size(current_database())) * 100, 2) AS percent
FROM (
SELECT pg_catalog.pg_namespace.nspname as schema_name,
pg_relation_size(pg_catalog.pg_class.oid) as table_size
FROM pg_catalog.pg_class
JOIN pg_catalog.pg_namespace ON relnamespace = pg_catalog.pg_namespace.oid
) t
GROUP BY schema_name
ORDER BY percent DESC;
And the extensions installed in each database are shown using \dx or:
SELECT * FROM pg_extension;
Connections¶
To get the current connections for each database, use:
select datname, numbackends from pg_stat_database;
This will show a table like:
| datname | numbackends |
|---|---|
| template0 | 0 |
| system1 | 10 |
| template1 | 0 |
To get some more information over the whole database cluster the connections used on the whole database server use:
select max_conn,used,res_for_super,max_conn-used-res_for_super res_for_normal
from
(select count(*) used from pg_stat_activity) t1,
(select setting::int res_for_super from pg_settings where name=$$superuser_reserved_connections$$) t2,
(select setting::int max_conn from pg_settings where name=$$max_connections$$) t3;
This will give you something like:
| max_conn | used | res_for_super | res_for_normal |
|---|---|---|---|
| 100 | 2 | 3 | 95 |
The user limit can be displayed using:
SELECT rolname, (SELECT count(*) FROM pg_stat_activity WHERE usename=rolname) AS used, rolconnlimit FROM pg_roles WHERE rolconnlimit <> -1;
| rolname | used | rolconnlimit |
|---|---|---|
| my_user | 12 | 30 |
And it can be changed using ALTER USER my_user CONNECTION LIMIT 50;
Processes¶
A short info what is running can be seen in htop as there is a process per connection running which also shows the database and type of SQL. More information what is running can be displayed using:
SELECT * FROM pg_stat_activity WHERE state = 'active';
The oldest processes can be also found:
SELECT pid, datname, usename, state, age(backend_xmin) * interval '1 sec' AS age
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
To stop one of this statements you can call:
SELECT pg_cancel_backend(<pid of the process>);
If the process cannot be killed, try:
SELECT pg_terminate_backend(<pid of the process>);
Blocking¶
Using the following SQL you will get a concrete List of who is blocking who:
-- run using postgres user
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks AS blocked_locks
JOIN pg_catalog.pg_stat_activity AS blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks AS blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity AS blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
Deadlock¶
The following query shows processes, which are blocking.
SELECT DISTINCT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
substring(blocked_activity.query, 0, 40) AS blocked_statement,
substring(blocking_activity.query, 0, 40) AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.GRANTED;
Optimize Table¶
First you can search for tables with big bloat in a database as (superuser):
SELECT schemaname as schema,
relname as table,
n_live_tup,
n_dead_tup,
CASE WHEN n_live_tup>0 THEN 100*n_dead_tup/n_live_tup ELSE 0 END AS bloat_percent,
last_autovacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup
/ (n_live_tup
* current_setting('autovacuum_vacuum_scale_factor')::float8
+ current_setting('autovacuum_vacuum_threshold')::float8)
DESC
If you find tables with a high bloat you can do a full vacuum on them, but this will block writing on the table and needs at least as much additional space as the table with index has.
VACUUM FULL VERBOSE ANALYZE <schema>.<table>;
Next you can show unused tables, which may be dropped:
SELECT schemaname as schema, relname as table
FROM pg_stat_user_tables
WHERE (idx_tup_fetch + seq_tup_read)= 0
ORDER BY schemaname, relname;
Optimize Index¶
You can show all indexes which are not used since last startup of database and may be removed:
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size,
pg_size_pretty(pg_relation_size(s.indexrelid)) as index_pretty
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
AND 0 <>ALL (i.indkey) -- no index column is an expression
AND NOT i.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;
Missing indexes can be a performance problem, so check if there should be some added:
SELECT
relname,
seq_scan-idx_scan AS too_much_seq,
case when seq_scan-idx_scan>0 THEN 'Missing Index?' ELSE 'OK' END,
pg_relation_size(relname::regclass) AS rel_size,
seq_scan,
idx_scan
FROM pg_stat_all_tables
WHERE schemaname='public' AND pg_relation_size(relname::regclass)>80000
ORDER BY too_much_seq DESC;
Replication¶
If postgres is used in Master/Slave mode with one or multiple slaves you can see the replication details:
SELECT pg_is_in_recovery() AS is_slave, * FROM pg_stat_replication;
| is_slave | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| false | 29971 | 16385 | replication | walreceiver | 10.11.0.29 | (null) | 54168 | 2021-11-02 16:58:30 | (null) | streaming | 261F/FF517EB0 | 261F/FF517EB0 | 261F/FF517EB0 | 261E/7F8E7A48 | 00:00:00.000298 | 00:00:00.000802 | 07:04:50.136028 | 0 | async |
SELECT
pg_is_in_recovery() AS is_slave,
pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() AS synced,
now()-pg_last_xact_replay_timestamp() AS lag
*
FROM pg_stat_wal_receiver;
| is_slave | synced | lag | pid | status | receive_start_lsn | receive_start_tli | received_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| true | false | 00:12:22.031294 | 25974 | streaming | 2532/67000000 | 1 | 2620/10964060 | 1 | 2021-11-25 22:10:35 | 2021-11-25 22:10:35 | 2620/10964060 | 2021-11-25 22:10:35 | manage | manage-db-1.host.user.dvb | 5432 | user=replication password=******** dbname=replication host=manage-db-1.host.user.dvb port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any |
The replication can also manually stopped and started:
SELECT pg_wal_replay_pause();
SELECT pg_wal_replay_resume();
Backup/Restore¶
To backup and restore or to move a database to another server (maybe with newer postgres version):
sudo -u postgres pg_dump --create mydb > mydb.dump
sudo -u postgres pg_dumpall --globals-only > globals.dump
And to later restore it check the following:
- create the tablespace directories if not there
- add locales to your system if missing (
locale -a,vi /etc/locale.gen,locale-gen) and restart postgres
Then you may import the data:
sudo -u postgres psql < globals.dump
sudo -u postgres psql < mydb.dump
Upgrade Postgres¶
As a fisrst step you should install the new postgres version beside the current one using the default installation:
sudo apt install postgresql-client-13 postgresql-13
After you have installed multiple versions of postgres on your server you can see them like:
$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres /var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
13 main 5433 online postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
To switch from the used 11 to the currently unused 13 do the following:
# first remove the new cluster
pg_dropcluster 13 main --stop
# now upgrade in one of two ways:
pg_upgradecluster 11 main # upgrade with the use of dumps
pg_upgradecluster 11 main --link --method=upgrade # upgrade using hard links
This may take some time in which a new 13 database will be created and updated. After done it should look like:
operator@db13:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
11 main 5434 down postgres /var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
13 main 5432 online postgres /var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log
Now after you have testesd it the old database server can be dropped:
pg_dropcluster 11 main
apt-get --purge remove postgresql-client-11 postgresql-11