Best of Makers: PostgreSQL Takes a New Turn
BILBAO, SPAIN — When PostgreSQL started, its uses were relatively niche, said Jonathan Katz of Amazon Web Services in this episode of The New Stack Makers.
Today, its uses are widespread across applications and workloads, said Katz, a principal product manager and technical lead at Amazon RDS and a core team member at Amazon Web Services on the Postgres project.
“Back when it started, it was it was fairly niche,” Katz said about PostgreSQL, in this On the Road episode of The New Stack Makers, recorded at the Open Source Summit in Europe. “There were other relational database systems out there, both open source and commercial.”
Katz has spent more than half his life working on Postgres. His first involvement came by writing SQL and attending user groups and events. He worked on the website release management and helped organize events.
“When I first got involved in the project, I was an app developer, I just wrote SQL,” Katz said. “And I thought, like, ‘Oh, cool, people get involved in Postgres, because they write SQL.'”
Well, people get involved in Postgres, because they love database internals. “There were just so many talented developers working on it. I felt I could be most helpful was working on the non-development work,” Katz said.
“I had experience organizing events. In college, I organized startup events. And I tried to carry that over into the open source communities. I helped organize the local New York City user group, and ultimately we started organizing an annual event around that. And that’s where I started in the community. Because, while a lot of the Postgres community operates on mailing lists, a lot of the brainstorming, and the connection-building occurs at the events. There’s no substitute for that.”
Postgres emerged from academic research at the University of California at Berkeley in the mid-1980s, created initially by Michael Stonebraker, a database researcher at Berkeley. In 1994, the project officially became PostgreSQL and was released as an open source project.
By the mid-1990s, the age of proprietary databases emerged with players such as Oracle (Oracle DB), IBM DB2, and Microsoft SQL. Open source versions that appeared included MySQL, MariaDB, and SQLite.
PostgreSQL for Workloads that Scale
Today, developers use PostgreSQL for workloads that scale. Katz cited one example in which a customer has stored more than 40 terabytes of JSON documents in a Postgres database.
Recently launched, PostgreSQL 16 includes logical replication from standby servers. And that helps with scalability.
“If you’re able to offload that work to a standby, which is typically less busy, you’re able to move some of that traffic away from your primary, and effects will be able to scale your primary further,” Katz said.
Katz said the Postgres community’s meticulousness and the time it takes on designs leads to more stable and reliable features.
“If you look at all the different organizations and individuals contributing to Postgres, while it might not be their feature, they’re active in the design and the review and working to be able to get it committed into the project,” Katz said.
Direct I/O is a long-term feature in the works, Katz said. It will take years to implement. It’s a complex problem. PostgreSQL uses Linux to write data to disk using async. Now, fsync, according to Open Groups, “shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected.”
Performance tradeoffs become the issue — there’s an extra layer to work through between PostgreSQL and the file system when using fsync as the extra layer. Direct IO may mean fewer hops, reduced latency and improved writing to disk.
But some tradeoffs need working through, and why it will take years to work through.
“Once Direct I/O is in, it’s going to allow for additional work around the performance of writing data to disk,” Katz said.
AWS built Amazon RDS on PostgreSQL to make it easier for developers to develop applications instead of operational tasks like deployment, backups and monitoring. With the support, Amazon RDS supports PostgreSQL releases. For example, Amazon RDS supports PostgreSQL 13, 14, 15 and 16.
Amazon RDS for PostgreSQL provides managed operational tasks like deployment, backups, monitoring, etc.
“There’s a lot of different features in the managed service that can help with common business operations,” Katz said. “So high availability, continuous automated backups, monitoring and things that make it simpler to manage on a day-to-day basis. And for me, with my app developer background, what’s nice about that is a lot of the operational work — that’s handled by the managed service.”
More Episodes from Open Source Summit EU 2023
Integrating a Data Warehouse and a Data Lake
WebAssembly’s Status in Computing
Powertools for AWS Lambda Grows with Help of Volunteers
How to Be a Better Ally in Open Source Communities
Open Source Development Threatened in Europe