Question: How to optimize python postgresql query for 3 millions records

Question

How to optimize python postgresql query for 3 millions records

Answers 1
Added at 2017-11-07 19:11
Tags
Question

I'm doing insert or update around 3 to 4 millions of data in postgresql using python script. Please see the code below. The requirement is insert if its new key or update the key with new value if key is already exist. But below code is making too much round trip connection to DB and its taking around 35 - 45 minutes to insert the 3 million records in DB which is very slow. How to avoid round trip connection and insert or update in a faster way?

Any help would be really appreciated.

Thanks for your help in advance.

InputFile.txt - This file has around 3 to 4 million line itesm

productKey1 printer1,printerModel1,printerPrice1,printerDesc1|
productKey2 sacnner2,scannerModel2,scannerPrice2,scannerDesc2|
productKey3 mobile3,mobileModel3,mobilePrice3,mobileDesc3|
productKey4 tv4,tvModel4,tvPrice4,tvDescription4|
productKey2 sacnner22,scannerModel22,scannerPrice22,scannerDesc22|

insert.py

def insertProduct(filename, conn):
   seen = set()
   cursor = conn.cursor()
   qi = "INSERT INTO productTable (key, value) VALUES (%s, %s);"
   qu = "UPDATE productTable SET value = CONCAT(value, %s) WHERE key = %s;"

   with open(filename) as f:
     for line in f:
       if line.strip():
         key, value = line.split(' ', 1)
         if key not in seen:
            seen.add(key)
            cursor.execute(qi, (key, value))
         else:
            cursor.execute(qu, (value, key))

         conn.commit()

conn = psycopg2.connect("dbname='productDB' user='myuser' host='localhost'")
insertProduct('InputFile.txt', conn)
Answers to

How to optimize python postgresql query for 3 millions records

nr: #1 dodano: 2017-11-08 14:11

Execute batches of prepared statements. http://initd.org/psycopg/docs/extras.html#fast-execution-helpers

import psycopg2, psycopg2.extras
def insertProduct(filename, conn):

    data = []
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if line:
                key, value = line.split(' ', 1)
                data.append((key, value))

    cursor = conn.cursor()
    cursor.execute("""
        prepare upsert (text, text) as
        with i as (
            insert into productTable (key, value)
            select $1, $2
            where not exists (select 1 from productTable where key = $1)
            returning *
        )
        update productTable p
        set value = concat (p.value, $2)
        where p.key = $1 and not exists (select 1 from i)
    """)
    psycopg2.extras.execute_batch(cursor, "execute upsert (%s, %s)", data, page_size=500)
    cursor.execute("deallocate upsert")
    conn.commit()          

conn = psycopg2.connect(database='cpn')
insertProduct('InputFile.txt', conn)
Source Show
◀ Wstecz