Developer

Week 38 – Exploring ChatGPT for Learning, Code, Data, NLP & Fun

Head of Product Innovation & Developer Strategy, Neo4j

December 6, 2022

7 min read

Last week (Nov. 30) ChatGPT was launched by open.ai and I’ve spent the days since then exploring different areas of its application (from world knowledge to software development). In today’s stream, we want to see if we can apply it to Learning Graph Databases.

If you missed the stream, you can watch the recording here — we had a lot of fun!

Also check out our upcoming livestreams this week with Going Meta 11 (Graph Expectations), and Learn with Neo4j with Jason Lengstorf on Building and Deploying Graph apps with Netlify.

But back to ChatGPT. It was announced on November 30 and I tried it out the same day. I started with normal questions, then went to world knowledge, predictions of the future, and then to code.

ChatGPT: Optimizing Language Models for Dialogue

It was also able to explain “What if”-like questions, like “How many LEGO blocks would it take to get to the moon,” but I haven’t checked the actual numbers.

One word of caution!

Please don’t take things that the language model tells you as truth, it very much tries to please the human and come up with answers instead of saying “I don’t know.” There were several occasions when it was plainly wrong. So always check the output, and rather use its responses as inspiration, quick-typing help, or as a starting point. But never rely on the correctness for critical applications.

It even provides its own disclaimer after producing an incorrect Cypher statement with wrong comments / explanations.

Incorrect Cypher Statement and Explanations

Disclaimer, that it doesn’t know anything.

Some useful tips:

Try to be specific with your prompt — the clearer you are the better the answers.
You can make it do things if you tell it to “imagine” a situation, and then play a role or act as the imagined character.
If it stops in between you can just say “continue,” as it’s limited to a 4096 token output.
Sometimes if it rejects your question and doesn’t want to do something it helps to hit try-again a few times — having it translate to other languages took a few tries before working fluently. (It even translated the US anthem to Klingon.)
For fun, you can have it answer in different literary styles.
You can trick it into circumventing its default prompt and limitation (like rendering images or enabling browsing). Some people on Twitter have been successful doing that, but I couldn’t.

In general, it’s definitely a more cohesive answering machine than Google or StackOverflow, as it produces a single answer with explanations (but no source attribution!) And again — be aware that its answer is not necessarily correct!

It is also impressive in applying general concepts like graph modeling to a specific domain (like here, to a supply chain):

Generating Data

What it is quite good at is generating information. You can tell the model to be a data generator and then have it generate random data based on that.

Also, getting existing information from its model and transferring formats works well, like here with the Royal Family as Graphviz Dot file, or turning JSON into CSV (from d3 les miserables).

Documentation Lookups / Searches

If you’re looking for something in a documentation that’s not well explained, ChatGPT can do a better job of it. It also kept the context of the previous discussion about the movie recommendation graph.

NOTE: The training cutoff was some time in September 2021, so newer information like here for the graph-data-science library v2 is not included.

Generating Cypher

Giving a textual representation of a graph, it can generate Cypher queries for the dataset. While that worked well yesterday, today its queries didn’t make sense.

Comparing Cypher and SQL or Relational with Graph

For comparing databases or query languages it does a reasonably good job, and even provides code examples.

Comparing SQL-Cypher and Relational-Graph

Translating between Query Languages

For translating between the query languages, you really need to understand the answers and be able to verify the results. While a regular Common Table Expression (CTE) to Cypher worked, the recursive CTE came up with made up stuff each time. Only in the inverse direction from var-length path query to recursive CTE did it work.

Answering Stackoverflow Questions

It can also answer stackoverflow questions (or advent of code and other code-challenges), like Tomaz Bratanic demonstrated here:

Get multiple level of one to many relationship results

But in doing so it still produces a large percentage of seemingly correct and well explained (longish) answers that are still incorrect! And who except for subject matter experts could check them? And where should they take the time from doing that, while it costs seconds to produce them?
That’s why ChatGPT outputs have been temporarily banned from Stackoverflow.

Temporary policy: ChatGPT is banned

NLP: Natural Language to Cypher

An interesting application is to translate phrases from natural language to Cypher — you can even give it an explanation of the graph model to use.

Tomaz Bratanic on Twitter: “Want to convert english to Cypher queries? #chatgpt has you covered! I might also use it to generate content for me 😄 pic.twitter.com/YrjZKNuLyl / Twitter”

Want to convert english to Cypher queries? #chatgpt has you covered! I might also use it to generate content for me 😄 pic.twitter.com/YrjZKNuLyl

This is also what Sixing Huang used in doctor.ai as described in this article and explained on our livestream.

Relationship Extraction with GPT-3

Making Stuff Up

If you try to use it for NLP on text you have to be careful though, especially if you want cross-references with other sources like dbpedia.

So while this works:

Asking it to produce wikidata IDs makes up random references!

Tomaz Bratanic on Twitter: “I jumped on the #chatgpt3 train today and convinced it was an information extraction tool. You have to use “imagine” to overcome its limitations, otherwise I doesn’t want to do NLP tasks. pic.twitter.com/UJawxywwEj / Twitter”

I jumped on the #chatgpt3 train today and convinced it was an information extraction tool. You have to use “imagine” to overcome its limitations, otherwise I doesn’t want to do NLP tasks. pic.twitter.com/UJawxywwEj

What is quite useful that it keeps the context of the conversation, so we can continue to ask it repeatedly different questions about the same topic without specifying it time and again.

New Knowledge vs. Drawing Insights

We got a question in the chat about ML models not being able to generate new knowledge. That’s true, but what is interesting— similar to knowledge graphs — is they are able to connect disparate parts of information that a human might not have done yet. I think that’s where most ideas come from — not truly “novel” thoughts but derivations and combinations of existing knowledge (see this TED talk by Matt Ridley).

https://medium.com/media/af51d024ea02d9ccf5818214ceb4337e/href

Here is an example where I asked it to create a card games for an tv series (The Expanse), including showing some cards.

Poem About Graphs Are Everywhere

In closing, we had it write a poem:

generate a poem about graph are everywhere in the style of Neil Gaiman:

Graphs are everywhere,
In every shape and size,
From the simple line graph
To the complex network.

They show us patterns and trends,
Reveal connections and insights,
Help us make predictions and decisions,
And bring order to the chaos of data.

In the world of finance and science,
In the realms of art and politics,
Graphs are a powerful tool
For understanding and explaining.

They are a window into the world,
A way to see beyond the surface,
To discover the hidden truths
And to tell the stories of our time.

So let us embrace the power of graphs,
And let them guide us on our journey,
As we explore the wonders of the world
And seek to understand the universe.

Conclusion

ChatGPT is impressive and helpful— but also scary and unhelpful. I guess like everything, if you already know what you’re doing it’s a powerful tool, but if not you can easily be led astray if you trust it too much. Things that definitely need to be added is validation of its own answers and providing sources. Speaking of sources — as it draws its information from publicly available information, if it’s commercialized, then open.ai, the company should give back to creators, wikipedia, researchers, developers, and many more, but I doubt they will.

Happy Holidays

Week 38 – Exploring ChatGPT for Learning, Code, Data, NLP and Fun was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.