PostgreSQL vs MySQL — Which Relational Database Is Better?
PostgreSQL and MySQL are the two most popular relational database management systems (RDMS) in the world. They are used by all market demographics, including commercial enterprises and open-source foundations. What they have in common is robust network fault tolerance and support for data clustering. What sets them apart is the topic of this article.
RDMS origin and history
In 1973, UC Berkeley started a project, dubbed "Ingres", to set a new standard for relational database management systems. The project was written in C and released to the public under an open-source license. According to the Wikipedia article on the matter, the advantage of Ingres was that it used QUEL, a more formal but more precise query language compared to SQL.
In 1985, the project was revamped at Berkeley under the name "Postgres", and later changed to "PostgreSQL" to acknowledge the accessibility of SQL. In 1994, Michael Widenius created a competitor product called "MySQL". The product was bought by Oracle through a corporate acquisition in 2010.
This reveals the first major difference between the two — PostgreSQL is a free product maintained by volunteers that do community support, while MySQL is a corporate product with professional deployment and support.
PostgreSQL vs MySQL — missing features list
On September 5, 2017, Hacker News user "kaamos" published a list of features present in MySQL but missing from PostgreSQL 9.2.23, which was the latest version at the time:
- user variables
- optimizer hints
- virtual columns
- optimizer tracing
- clustered indexes
- declarative partitioning
- emi-synchronous replication
- built-in event scheduler and logical replication
- ability to choose data page size on database creation
- distributed in-memory grid with automatic sharding
- synchronous multi-master cluster (Galera, Group Replication)
- page-level compression, encryption, and incremental backups
- write-optimized storage engines similar to MyRocks and TokuDB
- NoSQL client APIs (HandlerSocket, memcached protocol, X protocol)
- ability to specify column order when adding a new column with ALTER TABLE
- efficient MVCC implementation that is unaffected by XID wraparound problems and VACUUM issues
Kaamos notes that those features are "huge for people who design and operate the busiest websites on the internet, (w)hich is probably why MySQL still shines in that field."
In response to kaamos’ feature wish list, user "fusiongyro" replied that advanced replication strategies in MySQL cause "hair loss and divorce" due to faulty distributed consistency. In his experience, end-users assume that MySQL replication features will operate as intended without planning or oversight. When that assumption fails them, they experience a series of outages and errors.
2019 PostgreSQL vs MySQL discussion
A 2019 Hacker News thread on PostgreSQL vs MySQL provides more insights into the developer consensus over which is better. User "matthewmacleod" recommends using PostgreSQL as the default RDMS unless there is a reason not to. User "jerrytsai" says PostgreSQL has better features for analyzing data in a scientific sense.
"Madhadron" recommends developers use what they are more familiar with. That user notes that ease of deployment does not correlate with long-term feasibility. User "nmpennypacker" adds that PostgreSQL enforces data types natively, which MySQL does not. According to that user, haphazard data type use in MySQL can lead to "polluting data".
User "gtsteve" says that PostgreSQL and MySQL have different licensing for client-side drivers. In his opinion, MySQL sneakily changed the driver licensing scheme to be more restrictive. "CameronNemo" adds that the goal is to lure developers into using what they think is liberal source code and then back charging them after an audit.
"Evanelias" discloses that he worked profusely on MySQL and may be biased toward it but admits it is less "batteries included". According to that user, either will do, though the team’s experience should be the deciding factor. There are more people versed in using MySQL, which may tilt the scales in its favor, even if it is inferior to PostgreSQL in terms of specifications.
PostgreSQL vs MySQL — the Uber switch
In 2016, Uber Engineering published an article written by Evan Klitzke, in which he explains why the company moved from PostgreSQL to MySQL. In short, PostgreSQL had major drawbacks that prevented Uber from scaling and pivoting to integrate new services and data platforms.
The article lists the drawbacks as follows:
- limited architecture
- table corruption issues
- difficulty upgrading to newer versions
The limited architecture drawback refers to the way PostgreSQL writes and updates data. PostgreSQL is designed to have immutable row data. If the row data need to be updated, PostgreSQL creates a new row for it and updates all indexes related to it. The more indexes related to the data, the more updates it takes.
Evan dubs this problem "write amplification", where writing a few bytes to a database incurs an inordinate bandwidth burden on the entire system. This is tolerable if all databases are in one data center; at the time, Uber had two data center clusters in the US, one on each coast, incurring enormous bandwidth costs.
Table corruption issues
In conjunction with that problem, the tables could become corrupted due to indexes not being updated properly across all instances of a database to indicate the latest one. In some cases, the old row was still marked as the latest. Only a minute subset of rows was affected by this bug but it was potentially catastrophic.
Evan notes that Uber fixed this problem by making a snapshot of the master and resyncing all databases from it, which he describes as a laborious and fiddly process. Evan acknowledges that the bug had been fixed at the time of writing, but he was concerned that it appeared at all and that it could rear its ugly head in the future.
Difficulty upgrading to newer versions
To update PostgreSQL from 9.2 to 9.3, Evan and his team had to shut down the master database and run pg_upgrade, a command that could take many hours. During that time, the master database could serve no traffic. After that, the team had to create a new snapshot of the master and propagate it across replicas, another time-consuming process. As Uber’s databases increased substantially, the update process past 9.3 would also increase, causing significant downtime.
Advantages of MySQL
Evan says that MySQL allowed Uber to leverage newer technologies. In addition, MySQL had several replication modes, providing Uber with flexibility and less chance of catastrophic corruption. If MySQL glitched, the affected row data would be unusable but the database would stay functional.
MySQL also provided greater compatibility between releases, allowing Uber to update databases when it was convenient. Evan describes the practical effect as "almost zero downtime". In conclusion, Evan states that scaling with PostgreSQL caused the company "significant problems", though Uber still ran a few legacy PostgreSQL instances.
Markus Winand’s response to Uber’s blog
Three days after the Uber article was made public, Markus Winand, an Austria-based SQL educator, published a rebuttal. In his opinion, the article did not make it clear enough that there is nothing wrong with PostgreSQL; it was ineffective for Uber, which needed constant data updates.
Markus notes that PostgreSQL documentation explains obscure functions that Evan and his team could have used. Markus adds that developers have "ridiculously low" knowledge of SQL regardless of the size of their company, making them fumble relational databases. He concludes that Uber didn’t switch to MySQL; instead, Uber made a custom solution called "Schemaless" and supported it with MySQL.
PostgreSQL vs MySQL architecture
PostgreSQL is built on the architecture designed for object-relational databases while MySQL uses a strictly relational architecture. The difference is that PostgreSQL allows for more complex data at the expense of deployment complexity and higher maintenance requirements.
Data type support comparison
According to W3Schools’ description of MySQL, it supports three data types:
- date and time
According to the official documentation, PostgreSQL supports 21 data types:
- bit string
- text search
- binary data
- object identifier
- network address
That same documentation page states PostgreSQL supports creating new data types using "create type", which is further split into four versions:
This command should be used with care because it can crash the server or impede with database management.
Index support comparison
PostgreSQL documentation explains the system supports six index types:
The default setting for "create index" makes a B-tree index. PostgreSQL uses a query planner that evaluates the column that is about to be indexed to assign it the most fitting index type.
MySQL supports five index types:
- Primary key
Defining the proper index type improves data retrieval performance in MySQL. Some index types can only be used with certain data types. For example, Full-text only works with text-based columns.
Too many indexes in relational databases can degrade the performance of MySQL database functions, especially with regard to deleting, inserting, and updating queries. Placing more than one index type is possible in MySQL but opinions differ on when to do it. MySQL also provides the query explainer utility, which can help diagnose index-related performance issues on large databases.
Reviews and community support threads claim that PostgreSQL is faster when dealing with complex queries, especially read/write processes and huge databases, and MySQL is better at read-only operations.
A 2021 article by Petr Jahoda compared the database performance of six RDMS, including PostgreSQL and MySQL. His findings were that, compared to the same database creation task done in MySQL, PostgreSQL database performance showed:
- 86% lower memory usage
- 95% less initial CPU usage
- 70% smaller database image size
The second test created one table with 6 columns and inserted 10,000 data rows into it. The insertion was repeated five times to get a realistic result. PostgreSQL took around 100 seconds to complete all five data row insertions while MySQL took about 200.
The third test was centered on reading operations. Each RDMS was tasked with performing two reads per loop, which was then repeated 2,000 times. The entire test was repeated five times for a total of 10,000 read loops. PostgreSQL again took half the time it took MySQL for the same process — 84 seconds compared to 168.
Petr notes that he did the same database system tests in 2019 with almost identical findings. PostgreSQL was the clear winner back then and MySQL had the second-to-worst performance of all entrants.
Evan’s article exhibited a certain dose of frustration as he related his experience with PostgreSQL. Markus addressed it by recommending that Evan and other developers read PostgreSQL’s documentation before using it. That was not a snide remark but genuinely constructive feedback.
PostgreSQL has by far the most accessible documentation owing to clear examples, concise descriptions, and high quality of writing. PostgreSQL has a dedicated wiki, on which the "Don’t Do This" page is the prime example of high-quality software documentation. That kind of PostgreSQL support compensates for sketchy community support. SQL laymen who want to implement complex queries will have a delightful learning experience on the PostgreSQL wiki.
In comparison, software engineer Rishabh Khanna left a review for MySQL on TrustRadius stating that "they should work on their documentation." In his opinion, the fonts used cause readability issues. He praises MySQL as cheap, compatible with most programming languages, and easy to work with, granting it a 10/10 grade.
Professional MySQL support comes courtesy of MySQL subscription and features five core elements:
- 24/7 support
- knowledge base
- maintenance releases
- unlimited support incidents
- MySQL consultative support
Support for individual MySQL features depends on the subscription plan purchased. A footnote on the above link indicates that support for non-MySQL products can be purchased separately.
Storage engine support
According to MySQL 8.0 Reference Manual, it supports 9 storage engines, with InnoDB the default:
A section of the manual states third-party engines may work too, but Oracle cannot provide support for them. MySQL users can create and load a storage engine thanks to the modular architecture that allows loading/unloading of storage engines while the SQL (Structured Query Language) server is running.
Meanwhile, PostgreSQL does not have a storage engine as such. Third-party developers are trying to develop and promote a storage engine to bring PostgreSQL up to par with MySQL. One such attempt is called ZHEAP, with the primary purpose to reduce database issues stemming from write amplification.
Security is inversely correlated with accessibility; the perfectly secure system is one nobody can access. Developers have to balance between the two to protect against the most common attacks while saving time and resources. PostgreSQL and MySQL have a set of recommended security practices to protect against cyberattacks in a cost-efficient manner.
PostgreSQL manual advises:
- not trusting any user
- minimizing the time data is open to users
- demanding strict inputs "for maximum security"
When privileges are granted to users, they should be restricted as soon as possible so no other user can intervene and hijack the process. For example, if privileges are granted in connection to an object, they can be revoked in the same transaction that creates that object.
The most common MySQL cyberattack is SQL injection. If users with knowledge of programming languages are allowed to query a MySQL database, such as to retrieve data from it, they may find a way to enter malformed queries that can corrupt the database or crash the server.
MySQL Reference Manual section on security guidelines recommends sanitizing user inputs to protect against SQL injections. Caution is warranted even when the database includes only public, non-sensitive data to protect against DDoS attacks. Again, users with nuanced knowledge of programming languages can sabotage the database or the server.
PostgreSQL exhibits clear signs of superiority in environments that call for scientific and pseudo-scientific data processing and analysis. If the project requires correlating unorthodox data sets, creating novel data types, or handling complex queries, PostgreSQL is the most suitable and perhaps the only feasible RDMS.
MySQL is more appropriate in environments that call for nimble experimentation and prototyping. For example, Rishabh mentioned he used MySQL on an IoT project with great success. However, restricted data types make MySQL limited in application scope. The comparatively higher resource requirements also make it a poor choice for scaling if the enterprise operates on a tight budget.
In terms of performance, PostgreSQL trounces MySQL and is the most advanced database management system. However, developers can incorrectly deploy PostgreSQL, causing performance issues and database corruption.