obartunov: (Default)
Many people already said, that the conference was great. Many thanks to organisers and sponsors ! I want to thank russian company 1C for support of my trip. I and Alexander Korotkov started our work on "Full-text search in PostgreSQL in milliseconds" after Pgcon conference in Ottawa, then I was a month in Karakorum nountains and it was too late to submit talk and get support, so I was looking for sponsorhip. I asked 1C company, which I and Teodor Sigaev know very well, and fortunately I got support. That was the first time I was supported by russian company !

Several important things about the conference I have in mind:

1. Index-only count - GiST and GIN currently doesn't support index-only scan, but could support index-only count, which is very useful. I talked with Dimitry Fontaine and it seems to convince him to work on this with Cédric Villemain for 9.3

2. Index-only scan for GiST, GIN. Currently, only btree indexes supports index-only scan, but some opclasses of GiST and GIN (btree_gist, btree_gin, points, range type,,,) could also support it ! So, we need more flexible infrastructure in pg_am. I talked with Jeff Davis and he is very interested in this, hope he can do something for 9.3.

3. I asked EntrepriseDB for support of our FTS, still no answer yet. It's vital for getting new improved FTS ready for 9.3. EDB supported me and Teodor in 2006 and it was great, since that we have FTS in PostgreSQL core, hope it will be able to do something for us this time.

4. Unfortunately, Peter got cold last dat and we didn't discussed our work on json, that's pity. The big problem with types in json is how to modify typmod to support types in json ! Our initial plan was to improve hstore to support hstore itself and arrays and then use this binary representation implement json. Now, we have more bigger problem, we need storage for json scheme and postgresql currently has no infrastructure for this. It's a separate big project and we'll think about it. Thanks Heroku for boat !

Just two pictures from Prague, I have much more on flickr, visit PGconf set, I'll publish more pictures there, I have hundreds of them !

PostgreSQL people in Hradčany, Prague

I have strong hands, so I was able to made this picture from the boat at low light ! Notice, blue seagull !

Blue seagulls, Prague
obartunov: (Default)
I'm going to the PGConEU in Prague ! Thanks to russian company 1c (it's really big company) for sponsorship !

I and Alexander Korotkov will present lightning talk "Fulltext search in PostgreSQL in milliseconds". I understand, that this topic needs more time, but we submitted the talk too late, sorry. I was in Karakorum, Pakistan the whole july. Anyway, we got really impressive results with prototype - on 6 mln records classified database we got 6-8 ms median search query time, total 41 mln search queries in 8 hours. We used real-life data from the biggest russian classified service and real search queries extracted from web-logs. We hope to discuss some implementation issues with developers and attract attention of sponsors. The latter is important, since the amount of development itself is big. We also need to pass through review nighmare :) There are not so much time remains for 9.3, by the way.


Fulltext search in PostgreSQL is well known by its powerfulness and extendability. However, there are two main reasons that prevent PostgreSQL fulltext search to be as fast as specialized solutions:

1) It's implemented inside ACID DBMS, that's why it have to support atomicity, concurrency, WAL etc. This issue is inevitable since we implement fulltext search inside object-relational DBMS, so it's both advantage and disadvantage.
2) Fulltext indexes are only used for document filtration, while ranking require fetching documents from heap. It reduces speed of high-selectivity queries processing. This disadvantage is not inevitable and it could be avoided by inclusion additional information into GIN-indexes.

This talk presents prototype of PostgreSQL patch allowing to store positional information into fulltext index and to use it for ranking. In this case ranking is performed using only index information without fetching documents from heap. It accelerates processing of high-selectivity queries in dozens of times. The work of prototype will be demonstrated on well known large databases.


obartunov: (Default)

November 2012

    1 23
456789 10


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags