Oracle Netsuite is a cloud-based ERP. Like most systems today, it has an API so that external systems can get and analyze data.
Azure Databricks is an excellent system for analyzing data from Netsuite, but it does not have a built-in connector for Netsuite. So what do we do?
Where there is a will, there is a way, but I should stress that the following approach is unsupported and has some limitations (see below). Use it at your own risk.
Step 1 – get the Netsuite driver, and required connection parameters
Go to the Netsuite connect website (you’ll need a user and permissions for that), and download the Linux driver (Databricks virtual machines use Linux as an operating system).
You’ll also need to get the following parameters:
AccountId
Port (the default is 1708)
User
Password
RoleID
Step 2 - Upload the driver file to Databricks
From the downloaded driver zip file, extract the NQjc.jar file.
Go to Databricks workspace, and open your cluster.
Go to libraries, and click on install new.
Choose “JAR”
drag and drop the NQjc.jar and click install.
Step 3 - Use a notebook to get data
Now open a new notebook (with the language set to Python), attach the cluster, and start it.
Let set the connection parameters:
DataSource="NetSuite2.com"
AccountId=********
ServerHost=f"{AccountId}.connect.api.netsuite.com"
Port=1708
user = "***************"
password = "**************"
RoleId=****
url = f"jdbc:ns://{ServerHost}:{Port};ServerDataSource={DataSource};encrypted=1;NegotiateSSLClose=false;CustomProperties=(AccountID={AccountId};RoleID={RoleId})
driver = "com.netsuite.jdbc.openaccess.OpenAccessDriver"
And get a table from Netsuite into a dataframe:
ns_table = "account"
table_df = spark.read.format ( "jdbc" ) \
.option ( "driver" , driver) \
.option ( "url" , url) \
.option ("user", user) \
.option ("password", password) \
.option ( "dbtable" , ns_table) \
.load ()
This command succeeds, but when we try to display the dataframe:
display(table_df)
We get an error massage:
SQLSyntaxErrorException: [NetSuite][SuiteAnalytics Connect JDBC Driver][OpenAccess SDK SQL Engine]Syntax Error in the SQL statement.[10104]
Does it mean the dataframe does not get the data? Lets try to check if it has rows:
table_df.count()
This command return the right number of rows.
So the data is returned, but cannot be displayed correctly in Databricks.
Lets try to write it to a file:
table_df.write.csv("/mnt/tables/account.csv",header=True)
This command ends successfully. Now lets read the file:
account = spark.read.csv("/mnt/tables/account.csv",header=True)
display(account)
And It works! The data is displayed correctly.
So, as I said in the start, this is unsupported, but it works. You’ll have to write the data to a file and then reread it from the file to continue to transform it, but it will allow you to work with Netsuite data in Databricks.
i think it has to do with schema of empty columns?