9¸ö·¶Àý´øÄãÈëÃÅlangchain
AIħ·¨Ñ§Ôº
2023-09-05
·ÖÏíº£±¨

Ç°·½¸É»õÔ¤¾¯£ºÕâ¿ÉÄÜÊÇÄãÐÄÐÄÄîÄîÏëÕÒµÄ×îºÃ¶®×î¾ßʵ²ÙÐÔµÄlangchain½Ì³Ì¡£±¾ÎÄͨ¹ýÑÝʾ9¸ö¾ßÓдú±íÐÔµÄÓ¦Ó÷¶Àý£¬´øÄãÁã»ù´¡ÈëÃÅlangchain¡£

±¾ÎÄnotebookÔ´Â룺

https://github.com/lyhue1991/eat_chatgpt/blob/main/3_langchain_9_usecases.ipynb​github.com/lyhue1991/eat_chatgpt/blob/main/3_langchain_9_usecases.ipynb

9¸ö·¶Àý¹¦ÄÜÁбíÈçÏ£º

1£¬Îı¾×ܽá(Summarization): ¶ÔÎı¾/ÁÄÌìÄÚÈݵÄÖصãÄÚÈÝ×ܽᡣ

2£¬ÎĵµÎÊ´ð(Question and Answering Over Documents): ʹÓÃÎĵµ×÷ΪÉÏÏÂÎÄÐÅÏ¢£¬»ùÓÚÎĵµÄÚÈݽøÐÐÎÊ´ð¡£

3£¬ÐÅÏ¢³éÈ¡(Extraction): ´ÓÎı¾ÄÚÈÝÖгéÈ¡½á¹¹»¯µÄÄÚÈÝ¡£

4£¬½á¹ûÆÀ¹À(Evaluation): ·ÖÎö²¢ÆÀ¹ÀLLMÊä³öµÄ½á¹ûµÄºÃ»µ¡£

5£¬Êý¾Ý¿âÎÊ´ð(Querying Tabular Data): ´ÓÊý¾Ý¿â/ÀàÊý¾Ý¿âÄÚÈÝÖгéÈ¡Êý¾ÝÐÅÏ¢¡£

6£¬´úÂëÀí½â(Code Understanding): ·ÖÎö´úÂ룬²¢´Ó´úÂëÖлñÈ¡Âß¼­£¬Í¬Ê±Ò²Ö§³ÖQA¡£

7£¬API½»»¥(Interacting with APIs): ͨ¹ý¶ÔAPIÎĵµµÄÔĶÁ£¬Àí½âAPIÎĵµ²¢ÏòÕæʵÊÀ½çµ÷ÓÃAPI»ñÈ¡ÕæʵÊý¾Ý¡£

8£¬ÁÄÌì»úÆ÷ÈË(Chatbots): ¾ß±¸¼ÇÒäÄÜÁ¦µÄÁÄÌì»úÆ÷ÈË¿ò¼Ü£¨ÓÐUI½»»¥ÄÜÁ¦)¡£

9£¬ÖÇÄÜÌå(Agents): ʹÓÃLLMs½øÐÐÈÎÎñ·ÖÎöºÍ¾ö²ß£¬²¢µ÷Óù¤¾ßÖ´Ðоö²ß¡£

# ÔÚÎÒÃÇ¿ªÊ¼Ç°£¬°²×°ÐèÒªµÄÒÀÀµ
!pip install langchain
!pip install openai
!pip install tiktoken 
!pip install faiss-cpu 
openai_api_key='YOUR_API_KEY'
# ʹÓÃÄã×Ô¼ºµÄOpenAI API key

Ò»£¬ Îı¾×ܽá(Summarization)

ÈÓ¸øLLMÒ»¶ÎÎı¾£¬ÈÃËü¸øÄãÉú³É×ܽá¿ÉÒÔ˵ÊÇ×î³£¼ûµÄ³¡¾°Ö®Ò»ÁË¡£

Ä¿Ç°×î»ðµÄÓ¦ÓÃÓ¦¸ÃÊÇ chatPDF£¬¾ÍÊÇÕâÖÖ¹¦ÄÜ¡£

1£¬¶ÌÎı¾×ܽá

# Summaries Of Short Text
from langchain.llms import OpenAIfrom langchain import PromptTemplate
llm = OpenAI(temperature=0, model_name = 'gpt-3.5-turbo', openai_api_key=openai_api_key) # ³õʼ»¯LLMÄ£ÐÍ
# ´´½¨Ä£°åtemplate = """%INSTRUCTIONS:Please summarize the following piece of text.Respond in a manner that a 5 year old would understand.%TEXT:{text}"""
# ´´½¨Ò»¸ö Lang Chain Prompt Ä£°å£¬ÉÔºó¿ÉÒÔ²åÈëÖµprompt = PromptTemplate(
    input_variables=["text"],
    template=template,)
confusing_text = """For the next 130 years, debate raged.Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.¡°The problem is that when you look up close at the anatomy, it¡¯s evocative of a lot of different things, but it¡¯s diagnostic of nothing,¡± says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.¡°And it¡¯s so damn big that when whenever someone says it¡¯s something, everyone else¡¯s hackles get up: ¡®How could you have a lichen 20 feet tall?¡¯¡±"""
print ("------- Prompt Begin -------")# ´òÓ¡Ä£°åÄÚÈÝfinal_prompt = prompt.format(text=confusing_text)print(final_prompt)
print ("------- Prompt End -------")
------- Prompt Begin -------
%INSTRUCTIONS:Please summarize the following piece of text.Respond in a manner that a 5 year old would understand.
%TEXT:
For the next 130 years, debate raged.Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.¡°The problem is that when you look up close at the anatomy, it¡¯s evocative of a lot of different things, but it¡¯s diagnostic of nothing,¡± says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.¡°And it¡¯s so damn big that when whenever someone says it¡¯s something, everyone else¡¯s hackles get up: ¡®How could you have a lichen 20 feet tall?¡¯¡±
------- Prompt End -------
output = llm(final_prompt)print (output)
People argued for a long time about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But it was hard to tell for sure because it looked like different things up close and it was really, really big.

2£¬³¤Îı¾×ܽá

¶ÔÓÚÎı¾³¤¶È½Ï¶ÌµÄÎı¾ÎÒÃÇ¿ÉÒÔÖ±½ÓÕâÑùÖ´ÐÐsummary²Ù×÷

µ«ÊǶÔÓÚÎı¾³¤¶È³¬¹ýlLMÖ§³ÖµÄmax token size ʱ½«»áÓöµ½À§ÄÑ

Lang Chain ÌṩÁË¿ªÏä¼´ÓõŤ¾ß½â¾ö³¤Îı¾µÄÎÊÌ⣺load_summarize_chain

# Summaries Of Longer Text
from langchain.llms import OpenAIfrom langchain.chains.summarize import load_summarize_chainfrom langchain.text_splitter import RecursiveCharacterTextSplitter
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
with open('wonderland.txt', 'r') as file:
    text = file.read() # ÎÄÕ±¾ÉíÊÇ°®ÀöË¿ÃÎÓÎÏɾ³
# ´òӡС˵µÄÇ°285¸ö×Ö·ûprint (text[:285])
The Project Gutenberg eBook of Alice¡¯s Adventures in Wonderland, by Lewis Carroll
This eBook is for the use of anyone anywhere in the United States andmost other parts of the world at no cost and with almost no restrictionswhatsoever. You may copy it, give it away or re-use it unde
num_tokens = llm.get_num_tokens(text)
print (f"There are {num_tokens} tokens in your file") # È«ÎÄÒ»¹²4w8´Ê# ºÜÃ÷ÏÔÕâÑùµÄÎı¾Á¿ÊÇÎÞ·¨Ö±½ÓËͽøLLM½øÐд¦ÀíºÍÉú³ÉµÄ

There are 48613 tokens in your file

½â¾ö³¤Îı¾µÄ·½Ê½ÎÞ·ÇÊÇ'chunking','splitting' Ô­Îı¾ÎªÐ¡µÄ¶ÎÂä/·Ö¸î²¿·Ö

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)# ËäÈ»ÎÒʹÓõÄÊÇ RecursiveCharacterTextSplitter£¬µ«ÊÇÄãÒ²¿ÉÒÔʹÓÃÆäËû¹¤¾ßdocs = text_splitter.create_documents([text])
print (f"You now have {len(docs)} docs intead of 1 piece of text")

You now have 36 docs intead of 1 piece of text

ÏÖÔÚ¾ÍÐèÒªÒ»¸ö Lang Chain ¹¤¾ß£¬½«·Ö¶ÎÎı¾ËÍÈëLLM½øÐÐsummary

# ÉèÖà lang chain# ʹÓà map_reduceµÄchain_type£¬ÕâÑù¿ÉÒÔ½«¶à¸öÎĵµºÏ²¢³ÉÒ»¸öchain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True չʾÔËÐÐÈÕÖ¾
# Use it. This will run through the 36 documents, summarize the chunks, then get a summary of the summary.# µäÐ͵Ämap reduceµÄ˼·ȥ½â¾öÎÊÌ⣬½«ÎÄÕ²ð·Ö³É¶à¸ö²¿·Ö£¬ÔÙ½«¶à¸ö²¿·Ö·Ö±ð½øÐÐ summarize£¬×îºóÔÙ½øÐÐ ºÏ²¢£¬¶Ô summarys ½øÐÐ summaryoutput = chain.run(docs)print (output)# Try yourself
Alice follows a white rabbit down a rabbit hole and finds herself in a strange world full of peculiar characters. She experiences many strange adventures and is asked to settle disputes between the characters. In the end, she is in a court of justice with the King and Queen of Hearts and is questioned by the King. Alice reads a set of verses and has a dream in which she remembers a secret. Project Gutenberg is a library of electronic works founded by Professor Michael S. Hart and run by volunteers.

¶þ£¬ÎĵµÎÊ´ð(QA based Documents)

ΪÁËÈ·±£LLMÄܹ»Ö´ÐÐQAÈÎÎñ

.          ÐèÒªÏòLLM´«µÝÄܹ»ÈÃËû²Î¿¼µÄÉÏÏÂÎÄÐÅÏ¢

.          ÐèÒªÏòLLM׼ȷµØ´«´ïÎÒÃǵÄÎÊÌâ

1£¬¶ÌÎı¾ÎÊ´ð

# ¸ÅÀ¨À´Ëµ£¬Ê¹ÓÃÎĵµ×÷ΪÉÏÏÂÎĽøÐÐQAϵͳµÄ¹¹½¨¹ý³ÌÀàËÆÓÚ llm(your context + your question) = your answer# Simple Q&A Example
from langchain.llms import OpenAI
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
context = """Rachel is 30 years oldBob is 45 years oldKevin is 65 years old"""
question = "Who is under 40 years old?"
output = llm(context + question)
print (output.strip())

Rachel is under 40 years old.

2£¬³¤Îı¾ÎÊ´ð

¶ÔÓÚ¸ü³¤µÄÎı¾£¬¿ÉÒÔÎı¾½øÐзֿ飬¶Ô·Ö¿éµÄÄÚÈݽøÐÐ embedding£¬½« embedding ´æ´¢µ½Êý¾Ý¿âÖУ¬È»ºó½øÐвéѯ¡£

Ä¿±êÊÇÑ¡ÔñÏà¹ØµÄÎı¾¿é£¬µ«ÊÇÎÒÃÇÓ¦¸ÃÑ¡ÔñÄÄЩÎı¾¿éÄØ£¿Ä¿Ç°×îÁ÷Ðеķ½·¨ÊÇ»ùÓڱȽÏÏòÁ¿Ç¶ÈëÀ´Ñ¡ÔñÏàËƵÄÎı¾¡£

from langchain import OpenAIfrom langchain.vectorstores import FAISSfrom langchain.chains import RetrievalQAfrom langchain.document_loaders import TextLoaderfrom langchain.embeddings.openai import OpenAIEmbeddingsllm = OpenAI(temperature=0, openai_api_key=openai_api_key)
loader = TextLoader('wonderland.txt') # ÔØÈëÒ»¸ö³¤Îı¾£¬ÎÒÃÇ»¹ÊÇʹÓð®ÀöË¿ÂþÓÎÏɾ³ÕâƪС˵×÷ΪÊäÈëdoc = loader.load()print (f"You have {len(doc)} document")print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document

You have 164014 characters in that document

# ½«Ð¡Ëµ·Ö¸î³É¶à¸ö²¿·Ötext_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)docs = text_splitter.split_documents(doc)
# »ñÈ¡×Ö·ûµÄ×ÜÊý£¬ÒÔ±ã¿ÉÒÔ¼ÆËãƽ¾ùÖµnum_total_characters = sum([len(x.page_content) for x in docs])
print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
Now you have 62 documents that have an average of 2,846 characters (smaller pieces)
# ÉèÖÃ embedding ÒýÇæembeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Embed Îĵµ£¬È»ºóʹÓÃαÊý¾Ý¿â½«ÎĵµºÍԭʼÎı¾½áºÏÆðÀ´# ÕâÒ»²½»áÏò OpenAI ·¢Æð API ÇëÇódocsearch = FAISS.from_documents(docs, embeddings)
# ´´½¨QA-retrieval chainqa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
query = "What does the author describe the Alice following with?"qa.run(query)# Õâ¸ö¹ý³ÌÖУ¬¼ìË÷Æ÷»áÈ¥»ñÈ¡ÀàËƵÄÎļþ²¿·Ö£¬²¢½áºÏÄãµÄÎÊÌâÈà LLM ½øÐÐÍÆÀí£¬×îºóµÃµ½´ð°¸# ÕâÒ»²½»¹Óкܶà¿ÉÒÔϸ¾¿µÄ²½Ö裬±ÈÈçÈçºÎÑ¡Ôñ×î¼ÑµÄ·Ö¸î´óС£¬ÈçºÎÑ¡Ôñ×î¼ÑµÄ embedding ÒýÇ棬ÈçºÎÑ¡Ôñ×î¼ÑµÄ¼ìË÷Æ÷µÈµÈ# ͬʱҲ¿ÉÒÔÑ¡ÔñÔƶËÏòÁ¿´æ´¢
' The author describes Alice following a White Rabbit with pink eyes.'

Èý£¬ÐÅÏ¢³éÈ¡(Extraction)

ExtractionÊÇ´ÓÒ»¶ÎÎı¾ÖнâÎö½á¹¹»¯Êý¾ÝµÄ¹ý³Ì.

ͨ³£ÓëExtraction parserÒ»ÆðʹÓã¬ÒÔ¹¹½¨Êý¾Ý£¬ÒÔÏÂÊÇһЩʹÓ÷¶Àý¡£

.          ´Ó¾ä×ÓÖÐÌáÈ¡½á¹¹»¯ÐÐÒÔ²åÈëÊý¾Ý¿â

.          ´Ó³¤ÎĵµÖÐÌáÈ¡¶àÐÐÒÔ²åÈëÊý¾Ý¿â

.          ´ÓÓû§²éѯÖÐÌáÈ¡²ÎÊýÒÔ½øÐÐ API µ÷ÓÃ

.          ×î½ü×î»ðµÄ Extraction ¿âÊÇ KOR

1£¬ÊÖ¶¯¸ñʽת»»

from langchain.schema import HumanMessagefrom langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI
chat_model = ChatOpenAI(temperature=0, model='gpt-3.5-turbo', openai_api_key=openai_api_key)
# Vanilla Extractioninstructions = """You will be given a sentence with fruit names, extract those fruit names and assign an emoji to themReturn the fruit name and emojis in a python dictionary"""
fruit_names = """Apple, Pear, this is an kiwi"""
# Make your prompt which combines the instructions w/ the fruit namesprompt = (instructions + fruit_names)
# Call the LLMoutput = chat_model([HumanMessage(content=prompt)])
print (output.content)print (type(output.content))
{'Apple': ' ', 'Pear': ' ', 'kiwi': ' '}<class 'str'>
output_dict = eval(output.content) #ÀûÓÃpythonÖеÄevalº¯ÊýÊÖ¶¯×ª»»¸ñʽ
print (output_dict)print (type(output_dict))

2£¬×Ô¶¯¸ñʽת»»

ʹÓÃlangchain.output_parsers.StructuredOutputParser¿ÉÒÔ×Ô¶¯Éú³ÉÒ»¸ö´øÓиñʽ˵Ã÷µÄÌáʾ¡£

ÕâÑù¾Í²»ÐèÒªµ£ÐÄÌáʾ¹¤³ÌÊä³ö¸ñʽµÄÎÊÌâÁË£¬½«Õⲿ·ÖÍêÈ«½»¸ø Lang Chain À´Ö´ÐУ¬½«LLMµÄÊä³öת»¯Îª python ¶ÔÏó¡£

# ½âÎöÊä³ö²¢»ñÈ¡½á¹¹»¯µÄÊý¾Ýfrom langchain.output_parsers import StructuredOutputParser, ResponseSchema
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")]
# ½âÎöÆ÷½«»á°ÑLLMµÄÊä³öʹÓÃÎÒ¶¨ÒåµÄschema½øÐнâÎö²¢·µ»ØÆÚ´ýµÄ½á¹¹Êý¾Ý¸øÎÒoutput_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()print(format_instructions)
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json{
         "artist": string  // The name of the musical artist
         "song": string  // The name of the song that the artist plays}```
# Õâ¸ö Prompt Óë֮ǰÎÒÃǹ¹½¨ Chat Model ʱ Prompt ²»Í¬# Õâ¸ö Prompt ÊÇÒ»¸ö ChatPromptTemplate£¬Ëü»á×Ô¶¯½«ÎÒÃǵÄÊä³öת»¯Îª python ¶ÔÏóprompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \                                                    {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions})
artist_query = prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")print(artist_query.messages[0].content)
Given a command from the user, extract the artist and song names 
                                                     The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json{
         "artist": string  // The name of the musical artist
         "song": string  // The name of the song that the artist plays}```I really like So Young by Portugal. The Man
artist_output = chat_model(artist_query.to_messages())output = output_parser.parse(artist_output.content)
print (output)print (type(output))# ÕâÀïҪעÒâµÄÊÇ£¬ÒòΪÎÒÃÇʹÓÃµÄ turbo Ä£ÐÍ£¬Éú³ÉµÄ½á¹û²¢²»Ò»¶¨ÊÇÿ´Î¶¼Ò»ÖµÄ# Ìæ»»³Égpt4Ä£ÐÍ¿ÉÄÜÊǸüºÃµÄÑ¡Ôñ
{'artist': 'Portugal. The Man', 'song': 'So Young'}<class 'dict'>

ËÄ£¬½á¹ûÆÀ¹À(Evaluation)

ÓÉÓÚ×ÔÈ»ÓïÑԵIJ»¿ÉÔ¤²âÐԺͿɱäÐÔ£¬ÆÀ¹ÀLLMµÄÊä³öÊÇ·ñÕýÈ·ÓÐЩÀ§ÄÑ£¬langchain ÌṩÁËÒ»ÖÖ·½Ê½°ïÖúÎÒÃÇÈ¥½â¾öÕâÒ»ÄÑÌâ¡£

# Embeddings, store, and retrievalfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains import RetrievalQA
# Model and doc loaderfrom langchain import OpenAIfrom langchain.document_loaders import TextLoader
# Evalfrom langchain.evaluation.qa import QAEvalChain
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
# »¹ÊÇʹÓð®ÀöË¿ÂþÓÎÏɾ³×÷ΪÎı¾ÊäÈëloader = TextLoader('wonderland.txt')doc = loader.load()
print (f"You have {len(doc)} document")print (f"You have {len(doc[0].page_content)} characters in that document")
You have 1 documentYou have 164014 characters in that document
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)docs = text_splitter.split_documents(doc)
# Get the total number of characters so we can see the average laternum_total_characters = sum([len(x.page_content) for x in docs])
print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
Now you have 62 documents that have an average of 2,846 characters (smaller pieces)
# Embeddings and docstoreembeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)docsearch = FAISS.from_documents(docs, embeddings)
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), input_key="question")# ×¢ÒâÕâÀïµÄ input_key ²ÎÊý£¬Õâ¸ö²ÎÊý¸æËßÁË chain ÎÒµÄÎÊÌâÔÚ×ÖµäÖеÄÄĸö key Àï# ÕâÑù chain ¾Í»á×Ô¶¯È¥ÕÒµ½ÎÊÌâ²¢½«Æä´«µÝ¸ø LLM
question_answers = [
    {'question' : "Which animal give alice a instruction?", 'answer' : 'rabbit'},
    {'question' : "What is the author of the book", 'answer' : 'Elon Mask'}]
predictions = chain.apply(question_answers)predictions# ʹÓÃLLMÄ£ÐͽøÐÐÔ¤²â£¬²¢½«´ð°¸ÓëÎÒÌṩµÄ´ð°¸½øÐбȽϣ¬ÕâÀïÐÅÈÎÎÒ×Ô¼ºÌṩµÄÈ˹¤´ð°¸ÊÇÕýÈ·µÄ
[{'question': 'Which animal give alice a instruction?',
  'answer': 'rabbit',
  'result': ' The Caterpillar gave Alice instructions.'},
 {'question': 'What is the author of the book',
  'answer': 'Elon Mask',
  'result': ' The author of the book is Lewis Carroll.'}]
# Start your eval chaineval_chain = QAEvalChain.from_llm(llm)
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')
graded_outputs
[{'text': ' INCORRECT'}, {'text': ' INCORRECT'}]

Î壬Êý¾Ý¿âÎÊ´ð(Querying Tabular Data)

# ʹÓÃ×ÔÈ»ÓïÑÔ²éѯһ¸ö SQLite Êý¾Ý¿â£¬ÎÒÃǽ«Ê¹ÓþɽðɽÊ÷ľÊý¾Ý¼¯# Don't run following code if you don't run sqlite and follow dbfrom langchain import OpenAI, SQLDatabase, SQLDatabaseChain
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
sqlite_db_path = 'data/San_Francisco_Trees.db'db = SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")
db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)
db_chain.run("How many Species of trees are there in San Francisco?")

.          Find which table to use

.          Find which column to use

.          Construct the correct sql query

.          Execute that query

.          Get the result

.          Return a natural language reponse back

confirm LLM result via pandas

import sqlite3import pandas as pd
# Connect to the SQLite databaseconnection = sqlite3.connect(sqlite_db_path)
# Define your SQL queryquery = "SELECT count(distinct qSpecies) FROM SFTrees"
# Read the SQL query into a Pandas DataFramedf = pd.read_sql_query(query, connection)
# Close the connectionconnection.close()
# Display the result in the first column first cellprint(df.iloc[0,0])

Áù£¬´úÂëÀí½â(Code Understanding)

´úÂëÀí½âÓõ½µÄ¹¤¾ßºÍÎĵµÎÊ´ð²î²»¶à£¬²»¹ýÎÒÃǵÄÊäÈëÊÇÒ»¸öÏîÄ¿µÄ´úÂë¡£

# Helper to read local filesimport os
# Vector Supportfrom langchain.vectorstores import FAISSfrom langchain.embeddings.openai import OpenAIEmbeddings
# Model and chainfrom langchain.chat_models import ChatOpenAI
# Text splittersfrom langchain.text_splitter import CharacterTextSplitterfrom langchain.document_loaders import TextLoader
llm = ChatOpenAI(model='gpt-3.5-turbo', openai_api_key=openai_api_key)
embeddings = OpenAIEmbeddings(disallowed_special=(), openai_api_key=openai_api_key)
root_dir = '/content/drive/MyDrive/thefuzz-master'docs = []
# Go through each folderfor dirpath, dirnames, filenames in os.walk(root_dir):
    
    # Go through each file
    for file in filenames:
        try: 
            # Load up the file as a doc and split
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass
print (f"You have {len(docs)} documents\n")print ("------ Start Document ------")print (docs[0].page_content[:300])
You have 175 documents
------ Start Document ------from timeit import timeitimport mathimport csv
iterations = 100000
reader = csv.DictReader(open('data/titledata.csv'), delimiter='|')titles = [i['custom_title'] for i in reader]title_blob = '\n'.join(titles)
cirque_strings = [
    "cirque du soleil - zarkana - las vegas",
    "cirque du sol
docsearch = FAISS.from_documents(docs, embeddings)
# Get our retriever readyqa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
query = "What function do I use if I want to find the most similar item in a list of items?"output = qa.run(query)
print (output)
You can use the `process.extractOne()` function from `thefuzz` package to find the most similar item in a list of items. For example:
```from thefuzz import process
choices = ["New York Yankees", "Boston Red Sox", "Chicago Cubs", "Los Angeles Dodgers"]query = "new york mets vs atlanta braves"
best_match = process.extractOne(query, choices)print(best_match)```
This will output:
```('New York Yankees', 50)```
Where `('New York Yankees', 50)` means that the closest match found was "New York Yankees" with a score of 50 (out of 100).
query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"output = qa.run(query)print(output)
process.extractOne(query, choices)

Æߣ¬API½»»¥(Interacting with APIs)

Èç¹ûÄãÐèÒªµÄÊý¾Ý»ò²Ù×÷ÔÚ API Ö®ºó£¬¾ÍÐèÒªLLMÄܹ»ºÍAPI½øÐн»»¥¡£

µ½Õâ¸ö»·½Ú£¬¾ÍÓë Agents ºÍ Plugins ϢϢÏà¹ØÁË¡£

Demo¿ÉÄܼܺòµ¥£¬µ«Êǹ¦ÄÜ¿ÉÒԺܸ´ÔÓ¡£

from langchain.chains import APIChainfrom langchain.llms import OpenAI
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
api_docs = """BASE URL: https://restcountries.com/API Documentation:The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:    - name: Name of country - Ex: italy, france    The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:    - currency: 3 letter currency. Example: USD, COP    Woo! This is my documentation"""
chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)
chain_new.run('Can you tell me information about france?')

' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (€). It has a population of 67,391,582 and its borders are with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'
chain_new.run('Can you tell me about the currency COP?')

' The currency of Colombia is the Colombian peso (COP), symbolized by the "$" sign.'

°Ë£¬ÁÄÌì»úÆ÷ÈË(Chatbots)

ÁÄÌì»úÆ÷ÈËʹÓÃÁË֮ǰÌá¼°¹ýµÄºÜ¶à¹¤¾ß£¬ÇÒ×îÖØÒªµÄÊÇÔö¼ÓÁËÒ»¸öÖØÒªµÄ¹¤¾ß£º¼ÇÒäÁ¦¡£

ÓëÓû§½øÐÐʵʱ½»»¥£¬ÎªÓû§Ìṩ×ÔÈ»ÓïÑÔÎÊÌâµÄƽÒ×½üÈ赀 UI£¬

from langchain.llms import OpenAIfrom langchain import LLMChainfrom langchain.prompts.prompt import PromptTemplate
# Chat specific componentsfrom langchain.memory import ConversationBufferMemory
template = """You are a chatbot that is unhelpful.Your goal is to not help the user but only make jokes.Take what the user is saying and make a joke out of it{chat_history}Human: {human_input}Chatbot:"""
prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template)memory = ConversationBufferMemory(memory_key="chat_history")
llm_chain = LLMChain(
    llm=OpenAI(openai_api_key=openai_api_key), 
    prompt=prompt, 
    verbose=True, 
    memory=memory)
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")

' An pear is a fruit, but a vegetable-pear is a pun-ishable offense!'
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")# ÕâÀïµÚ¶þ¸öÎÊÌâµÄ´ð°¸ÊÇÀ´×ÔÓÚµÚÒ»¸ö´ð°¸±¾ÉíµÄ£¬Òò´ËÎÒÃÇʹÓõ½ÁË memory

" An pear - but don't let it get to your core!"

¾Å£¬ÖÇÄÜÌå(Agents)

AgentsÊÇ LLM ÖÐ×îÈÈÃÅµÄ Ö÷ÌâÖ®Ò»¡£

Agents¿ÉÒԲ鿴Êý¾Ý¡¢ÍƶÏÏÂÒ»²½Ó¦¸Ã²ÉȡʲôÐж¯£¬²¢Í¨¹ý¹¤¾ßΪÄúÖ´ÐиÃÐж¯, ÊÇÒ»¸ö¾ß±¸AIÖÇÄܵľö²ßÕß¡£

ÎÂÜ°Ìáʾ£ºÐ¡ÐÄʹÓà Auto GPT, »áѸËÙÏûºÄµôÄã´óÁ¿µÄtoken¡£

# Helpersimport osimport json
from langchain.llms import OpenAI
# Agent importsfrom langchain.agents import load_toolsfrom langchain.agents import initialize_agent
# Tool importsfrom langchain.agents import Toolfrom langchain.utilities import GoogleSearchAPIWrapperfrom langchain.utilities import TextRequestsWrapper
os.environ["GOOGLE_CSE_ID"] = "YOUR_GOOGLE_CSE_ID"os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
search = GoogleSearchAPIWrapper()
requests = TextRequestsWrapper()
toolkit = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    ),
    Tool(
        name = "Requests",
        func=requests.get,
        description="Useful for when you to make a request to a URL"
    ),]
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)
response = agent({"input":"What is the capital of canada?"})response['output']

'Ottawa is the capital of Canada.'
response = agent({"input":"Tell me what the comments are about on this webpage https://news.ycombinator.com/item?id=34425779"})response['output']

'The comments on the webpage are about the history of Y Combinator.'

ÒÔÉÏ¡£Íòˮǧɽ×ÜÊÇÇ飬µã¸öÔÚ¿´Ðв»ÐУ¿

¸ÐлÃ÷ѵͬѧ¹©¸å£¡

notebookÔ´Â룺

https://github.com/sawyerbutton/NLP-Funda-2023-Spring/blob/main/Related/langchain_usecases.ipynb​github.com/sawyerbutton/NLP-Funda-2023-Spring/blo

³ö×Ô£ºhttps://zhuanlan.zhihu.com/p/654052645

© THE END

תÔØÇëÁªÏµ±¾ÍøÕ¾»ñµÃÊÚȨ

Ͷ¸å»ò°æȨÎÊÌâÇë¼Ó΢ÐÅ£ºskillupvip