Ç°·½¸É»õÔ¤¾¯£ºÕâ¿ÉÄÜÊÇÄãÐÄÐÄÄîÄîÏëÕÒµÄ×îºÃ¶®×î¾ßʵ²ÙÐÔµÄlangchain½Ì³Ì¡£±¾ÎÄͨ¹ýÑÝʾ9¸ö¾ßÓдú±íÐÔµÄÓ¦Ó÷¶Àý£¬´øÄãÁã»ù´¡ÈëÃÅlangchain¡£
±¾ÎÄnotebookÔ´Â룺
https://github.com/lyhue1991/eat_chatgpt/blob/main/3_langchain_9_usecases.ipynbgithub.com/lyhue1991/eat_chatgpt/blob/main/3_langchain_9_usecases.ipynb
9¸ö·¶Àý¹¦ÄÜÁбíÈçÏ£º
1£¬Îı¾×ܽá(Summarization): ¶ÔÎı¾/ÁÄÌìÄÚÈݵÄÖصãÄÚÈÝ×ܽᡣ
2£¬ÎĵµÎÊ´ð(Question and Answering Over
Documents): ʹÓÃÎĵµ×÷ΪÉÏÏÂÎÄÐÅÏ¢£¬»ùÓÚÎĵµÄÚÈݽøÐÐÎÊ´ð¡£
3£¬ÐÅÏ¢³éÈ¡(Extraction): ´ÓÎı¾ÄÚÈÝÖгéÈ¡½á¹¹»¯µÄÄÚÈÝ¡£
4£¬½á¹ûÆÀ¹À(Evaluation): ·ÖÎö²¢ÆÀ¹ÀLLMÊä³öµÄ½á¹ûµÄºÃ»µ¡£
5£¬Êý¾Ý¿âÎÊ´ð(Querying Tabular Data): ´ÓÊý¾Ý¿â/ÀàÊý¾Ý¿âÄÚÈÝÖгéÈ¡Êý¾ÝÐÅÏ¢¡£
6£¬´úÂëÀí½â(Code Understanding): ·ÖÎö´úÂ룬²¢´Ó´úÂëÖлñÈ¡Âß¼£¬Í¬Ê±Ò²Ö§³ÖQA¡£
7£¬API½»»¥(Interacting
with APIs): ͨ¹ý¶ÔAPIÎĵµµÄÔĶÁ£¬Àí½âAPIÎĵµ²¢ÏòÕæʵÊÀ½çµ÷ÓÃAPI»ñÈ¡ÕæʵÊý¾Ý¡£
8£¬ÁÄÌì»úÆ÷ÈË(Chatbots): ¾ß±¸¼ÇÒäÄÜÁ¦µÄÁÄÌì»úÆ÷ÈË¿ò¼Ü£¨ÓÐUI½»»¥ÄÜÁ¦)¡£
9£¬ÖÇÄÜÌå(Agents): ʹÓÃLLMs½øÐÐÈÎÎñ·ÖÎöºÍ¾ö²ß£¬²¢µ÷Óù¤¾ßÖ´Ðоö²ß¡£
#
ÔÚÎÒÃÇ¿ªÊ¼Ç°£¬°²×°ÐèÒªµÄÒÀÀµ
!pip install langchain
!pip install openai
!pip install tiktoken
!pip install faiss-cpu
openai_api_key='YOUR_API_KEY'
#
ʹÓÃÄã×Ô¼ºµÄOpenAI API key
Ò»£¬ Îı¾×ܽá(Summarization)
ÈÓ¸øLLMÒ»¶ÎÎı¾£¬ÈÃËü¸øÄãÉú³É×ܽá¿ÉÒÔ˵ÊÇ×î³£¼ûµÄ³¡¾°Ö®Ò»ÁË¡£
Ä¿Ç°×î»ðµÄÓ¦ÓÃÓ¦¸ÃÊÇ chatPDF£¬¾ÍÊÇÕâÖÖ¹¦ÄÜ¡£
1£¬¶ÌÎı¾×ܽá
# Summaries Of Short Text
from
langchain.llms
import
OpenAIfrom
langchain
import
PromptTemplate
llm
=
OpenAI(temperature=0,
model_name
=
'gpt-3.5-turbo',
openai_api_key=openai_api_key)
# ³õʼ»¯LLMÄ£ÐÍ
# ´´½¨Ä£°åtemplate
=
"""%INSTRUCTIONS:Please summarize the following piece of text.Respond in a manner that a 5 year old would understand.%TEXT:{text}"""
# ´´½¨Ò»¸ö Lang Chain Prompt Ä£°å£¬ÉÔºó¿ÉÒÔ²åÈëÖµprompt
=
PromptTemplate(
input_variables=["text"],
template=template,)
confusing_text
=
"""For the next 130 years, debate raged.Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.¡°The problem is that when you look up close at the anatomy, it¡¯s evocative of a lot of different things, but it¡¯s diagnostic of nothing,¡± says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.¡°And it¡¯s so damn big that when whenever someone says it¡¯s something, everyone else¡¯s hackles get up: ¡®How could you have a lichen 20 feet tall?¡¯¡±"""
print
("------- Prompt Begin -------")# ´òÓ¡Ä£°åÄÚÈÝfinal_prompt
=
prompt.format(text=confusing_text)print(final_prompt)
print
("------- Prompt End -------")
-------
Prompt
Begin
-------
%INSTRUCTIONS:Please
summarize
the
following
piece
of
text.Respond
in
a
manner
that
a
5
year
old
would
understand.
%TEXT:
For
the
next
130
years,
debate
raged.Some
scientists
called
Prototaxites
a
lichen,
others
a
fungus,
and
still
others
clung
to
the
notion
that
it
was
some
kind
of
tree.¡°The
problem
is
that
when
you
look
up
close
at
the
anatomy,
it¡¯s
evocative
of
a
lot
of
different
things,
but
it¡¯s
diagnostic
of
nothing,¡±
says
Boyce,
an
associate
professor
in
geophysical
sciences
and
the
Committee
on
Evolutionary
Biology.¡°And
it¡¯s
so
damn
big
that
when
whenever
someone
says
it¡¯s
something,
everyone
else¡¯s
hackles
get
up:
¡®How
could
you
have
a
lichen
20
feet
tall?¡¯¡±
-------
Prompt
End
-------
output
=
llm(final_prompt)print
(output)
People
argued
for
a
long
time
about
what
Prototaxites
was.
Some
thought
it
was
a
lichen,
some
thought
it
was
a
fungus,
and
some
thought
it
was
a
tree.
But
it
was
hard
to
tell
for
sure
because
it
looked
like
different
things
up
close
and
it
was
really,
really
big.
2£¬³¤Îı¾×ܽá
¶ÔÓÚÎı¾³¤¶È½Ï¶ÌµÄÎı¾ÎÒÃÇ¿ÉÒÔÖ±½ÓÕâÑùÖ´ÐÐsummary²Ù×÷
µ«ÊǶÔÓÚÎı¾³¤¶È³¬¹ýlLMÖ§³ÖµÄmax
token size ʱ½«»áÓöµ½À§ÄÑ
Lang Chain ÌṩÁË¿ªÏä¼´ÓõŤ¾ß½â¾ö³¤Îı¾µÄÎÊÌ⣺load_summarize_chain
# Summaries Of Longer Text
from
langchain.llms
import
OpenAIfrom
langchain.chains.summarize
import
load_summarize_chainfrom
langchain.text_splitter
import
RecursiveCharacterTextSplitter
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
with
open('wonderland.txt',
'r')
as
file:
text
=
file.read()
# ÎÄÕ±¾ÉíÊÇ°®ÀöË¿ÃÎÓÎÏɾ³
# ´òӡС˵µÄÇ°285¸ö×Ö·ûprint
(text[:285])
The
Project
Gutenberg
eBook
of
Alice¡¯s
Adventures
in
Wonderland,
by
Lewis
Carroll
This
eBook
is
for
the
use
of
anyone
anywhere
in
the
United
States
andmost
other
parts
of
the
world
at
no
cost
and
with
almost
no
restrictionswhatsoever.
You
may
copy
it,
give
it
away
or
re-use
it
unde
num_tokens
=
llm.get_num_tokens(text)
print
(f"There are {num_tokens} tokens in your file")
# È«ÎÄÒ»¹²4w8´Ê# ºÜÃ÷ÏÔÕâÑùµÄÎı¾Á¿ÊÇÎÞ·¨Ö±½ÓËͽøLLM½øÐд¦ÀíºÍÉú³ÉµÄ
There are 48613 tokens in your file
½â¾ö³¤Îı¾µÄ·½Ê½ÎÞ·ÇÊÇ'chunking','splitting' ÔÎı¾ÎªÐ¡µÄ¶ÎÂä/·Ö¸î²¿·Ö
text_splitter
=
RecursiveCharacterTextSplitter(separators=["\n\n",
"\n"],
chunk_size=5000,
chunk_overlap=350)# ËäÈ»ÎÒʹÓõÄÊÇ RecursiveCharacterTextSplitter£¬µ«ÊÇÄãÒ²¿ÉÒÔʹÓÃÆäËû¹¤¾ßdocs
=
text_splitter.create_documents([text])
print
(f"You now have {len(docs)} docs intead of 1 piece of text")
You now have 36 docs intead of 1 piece of
text
ÏÖÔÚ¾ÍÐèÒªÒ»¸ö Lang Chain ¹¤¾ß£¬½«·Ö¶ÎÎı¾ËÍÈëLLM½øÐÐsummary
# ÉèÖà lang chain# ʹÓà map_reduceµÄchain_type£¬ÕâÑù¿ÉÒÔ½«¶à¸öÎĵµºÏ²¢³ÉÒ»¸öchain
=
load_summarize_chain(llm=llm,
chain_type='map_reduce')
# verbose=True չʾÔËÐÐÈÕÖ¾
# Use it. This will run through the 36 documents, summarize the chunks, then get a summary of the summary.# µäÐ͵Ämap reduceµÄ˼·ȥ½â¾öÎÊÌ⣬½«ÎÄÕ²ð·Ö³É¶à¸ö²¿·Ö£¬ÔÙ½«¶à¸ö²¿·Ö·Ö±ð½øÐÐ summarize£¬×îºóÔÙ½øÐÐ ºÏ²¢£¬¶Ô summarys ½øÐÐ summaryoutput
=
chain.run(docs)print
(output)# Try yourself
Alice
follows
a
white
rabbit
down
a
rabbit
hole
and
finds
herself
in
a
strange
world
full
of
peculiar
characters.
She
experiences
many
strange
adventures
and
is
asked
to
settle
disputes
between
the
characters.
In
the
end,
she
is
in
a
court
of
justice
with
the
King
and
Queen
of
Hearts
and
is
questioned
by
the
King.
Alice
reads
a
set
of
verses
and
has
a
dream
in
which
she
remembers
a
secret.
Project
Gutenberg
is
a
library
of
electronic
works
founded
by
Professor
Michael
S.
Hart
and
run
by
volunteers.
¶þ£¬ÎĵµÎÊ´ð(QA based
Documents)
ΪÁËÈ·±£LLMÄܹ»Ö´ÐÐQAÈÎÎñ
.
ÐèÒªÏòLLM´«µÝÄܹ»ÈÃËû²Î¿¼µÄÉÏÏÂÎÄÐÅÏ¢
.
ÐèÒªÏòLLM׼ȷµØ´«´ïÎÒÃǵÄÎÊÌâ
1£¬¶ÌÎı¾ÎÊ´ð
# ¸ÅÀ¨À´Ëµ£¬Ê¹ÓÃÎĵµ×÷ΪÉÏÏÂÎĽøÐÐQAϵͳµÄ¹¹½¨¹ý³ÌÀàËÆÓÚ llm(your context + your question) = your answer# Simple Q&A Example
from
langchain.llms
import
OpenAI
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
context
=
"""Rachel is 30 years oldBob is 45 years oldKevin is 65 years old"""
question
=
"Who is under 40 years old?"
output
=
llm(context
+
question)
print
(output.strip())
Rachel is under 40 years old.
2£¬³¤Îı¾ÎÊ´ð
¶ÔÓÚ¸ü³¤µÄÎı¾£¬¿ÉÒÔÎı¾½øÐзֿ飬¶Ô·Ö¿éµÄÄÚÈݽøÐÐ embedding£¬½« embedding ´æ´¢µ½Êý¾Ý¿âÖУ¬È»ºó½øÐвéѯ¡£
Ä¿±êÊÇÑ¡ÔñÏà¹ØµÄÎı¾¿é£¬µ«ÊÇÎÒÃÇÓ¦¸ÃÑ¡ÔñÄÄЩÎı¾¿éÄØ£¿Ä¿Ç°×îÁ÷Ðеķ½·¨ÊÇ»ùÓڱȽÏÏòÁ¿Ç¶ÈëÀ´Ñ¡ÔñÏàËƵÄÎı¾¡£
from
langchain
import
OpenAIfrom
langchain.vectorstores
import
FAISSfrom
langchain.chains
import
RetrievalQAfrom
langchain.document_loaders
import
TextLoaderfrom
langchain.embeddings.openai
import
OpenAIEmbeddingsllm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
loader
=
TextLoader('wonderland.txt')
# ÔØÈëÒ»¸ö³¤Îı¾£¬ÎÒÃÇ»¹ÊÇʹÓð®ÀöË¿ÂþÓÎÏɾ³ÕâƪС˵×÷ΪÊäÈëdoc
=
loader.load()print
(f"You have {len(doc)} document")print
(f"You have {len(doc[0].page_content)} characters in that document")
You have 1 document
You have 164014 characters in that
document
# ½«Ð¡Ëµ·Ö¸î³É¶à¸ö²¿·Ötext_splitter
=
RecursiveCharacterTextSplitter(chunk_size=3000,
chunk_overlap=400)docs
=
text_splitter.split_documents(doc)
# »ñÈ¡×Ö·ûµÄ×ÜÊý£¬ÒÔ±ã¿ÉÒÔ¼ÆËãƽ¾ùÖµnum_total_characters
=
sum([len(x.page_content)
for
x
in
docs])
print
(f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
Now
you
have
62
documents
that
have
an
average
of
2,846
characters
(smaller
pieces)
# ÉèÖÃ embedding ÒýÇæembeddings
=
OpenAIEmbeddings(openai_api_key=openai_api_key)
# Embed Îĵµ£¬È»ºóʹÓÃαÊý¾Ý¿â½«ÎĵµºÍÔʼÎı¾½áºÏÆðÀ´# ÕâÒ»²½»áÏò OpenAI ·¢Æð API ÇëÇódocsearch
=
FAISS.from_documents(docs,
embeddings)
# ´´½¨QA-retrieval chainqa
=
RetrievalQA.from_chain_type(llm=llm,
chain_type="stuff",
retriever=docsearch.as_retriever())
query
=
"What does the author describe the Alice following with?"qa.run(query)# Õâ¸ö¹ý³ÌÖУ¬¼ìË÷Æ÷»áÈ¥»ñÈ¡ÀàËƵÄÎļþ²¿·Ö£¬²¢½áºÏÄãµÄÎÊÌâÈà LLM ½øÐÐÍÆÀí£¬×îºóµÃµ½´ð°¸# ÕâÒ»²½»¹Óкܶà¿ÉÒÔϸ¾¿µÄ²½Ö裬±ÈÈçÈçºÎÑ¡Ôñ×î¼ÑµÄ·Ö¸î´óС£¬ÈçºÎÑ¡Ôñ×î¼ÑµÄ embedding ÒýÇ棬ÈçºÎÑ¡Ôñ×î¼ÑµÄ¼ìË÷Æ÷µÈµÈ# ͬʱҲ¿ÉÒÔÑ¡ÔñÔƶËÏòÁ¿´æ´¢
' The author describes Alice following a White Rabbit with pink eyes.'
Èý£¬ÐÅÏ¢³éÈ¡(Extraction)
ExtractionÊÇ´ÓÒ»¶ÎÎı¾ÖнâÎö½á¹¹»¯Êý¾ÝµÄ¹ý³Ì.
ͨ³£ÓëExtraction parserÒ»ÆðʹÓã¬ÒÔ¹¹½¨Êý¾Ý£¬ÒÔÏÂÊÇһЩʹÓ÷¶Àý¡£
.
´Ó¾ä×ÓÖÐÌáÈ¡½á¹¹»¯ÐÐÒÔ²åÈëÊý¾Ý¿â
.
´Ó³¤ÎĵµÖÐÌáÈ¡¶àÐÐÒÔ²åÈëÊý¾Ý¿â
.
´ÓÓû§²éѯÖÐÌáÈ¡²ÎÊýÒÔ½øÐÐ API µ÷ÓÃ
.
×î½ü×î»ðµÄ Extraction ¿âÊÇ KOR
1£¬ÊÖ¶¯¸ñʽת»»
from
langchain.schema
import
HumanMessagefrom
langchain.prompts
import
PromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate
from
langchain.chat_models
import
ChatOpenAI
chat_model
=
ChatOpenAI(temperature=0,
model='gpt-3.5-turbo',
openai_api_key=openai_api_key)
# Vanilla Extractioninstructions
=
"""You will be given a sentence with fruit names, extract those fruit names and assign an emoji to themReturn the fruit name and emojis in a python dictionary"""
fruit_names
=
"""Apple, Pear, this is an kiwi"""
# Make your prompt which combines the instructions w/ the fruit namesprompt
=
(instructions
+
fruit_names)
# Call the LLMoutput
=
chat_model([HumanMessage(content=prompt)])
print
(output.content)print
(type(output.content))
{'Apple':
' ',
'Pear':
' ',
'kiwi':
' '}<class
'str'>
output_dict
=
eval(output.content)
#ÀûÓÃpythonÖеÄevalº¯ÊýÊÖ¶¯×ª»»¸ñʽ
print
(output_dict)print
(type(output_dict))
2£¬×Ô¶¯¸ñʽת»»
ʹÓÃlangchain.output_parsers.StructuredOutputParser¿ÉÒÔ×Ô¶¯Éú³ÉÒ»¸ö´øÓиñʽ˵Ã÷µÄÌáʾ¡£
ÕâÑù¾Í²»ÐèÒªµ£ÐÄÌáʾ¹¤³ÌÊä³ö¸ñʽµÄÎÊÌâÁË£¬½«Õⲿ·ÖÍêÈ«½»¸ø Lang Chain À´Ö´ÐУ¬½«LLMµÄÊä³öת»¯Îª python ¶ÔÏó¡£
# ½âÎöÊä³ö²¢»ñÈ¡½á¹¹»¯µÄÊý¾Ýfrom
langchain.output_parsers
import
StructuredOutputParser,
ResponseSchema
response_schemas
=
[
ResponseSchema(name="artist",
description="The name of the musical artist"),
ResponseSchema(name="song",
description="The name of the song that the artist plays")]
# ½âÎöÆ÷½«»á°ÑLLMµÄÊä³öʹÓÃÎÒ¶¨ÒåµÄschema½øÐнâÎö²¢·µ»ØÆÚ´ýµÄ½á¹¹Êý¾Ý¸øÎÒoutput_parser
=
StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions
=
output_parser.get_format_instructions()print(format_instructions)
The
output
should
be
a
markdown
code
snippet
formatted
in
the
following
schema,
including
the
leading
and
trailing
"\`\`\`json"
and
"\`\`\`":
```json{
"artist":
string
//
The
name
of
the
musical
artist
"song":
string
//
The
name
of
the
song
that
the
artist
plays}```
# Õâ¸ö Prompt Óë֮ǰÎÒÃǹ¹½¨ Chat Model ʱ Prompt ²»Í¬# Õâ¸ö Prompt ÊÇÒ»¸ö ChatPromptTemplate£¬Ëü»á×Ô¶¯½«ÎÒÃǵÄÊä³öת»¯Îª python ¶ÔÏóprompt
=
ChatPromptTemplate(
messages=[
HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \ {format_instructions}\n{user_prompt}")
],
input_variables=["user_prompt"],
partial_variables={"format_instructions":
format_instructions})
artist_query
=
prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")print(artist_query.messages[0].content)
Given
a
command
from
the
user,
extract
the
artist
and
song
names
The
output
should
be
a
markdown
code
snippet
formatted
in
the
following
schema,
including
the
leading
and
trailing
"\`\`\`json"
and
"\`\`\`":
```json{
"artist":
string
//
The
name
of
the
musical
artist
"song":
string
//
The
name
of
the
song
that
the
artist
plays}```I
really
like
So
Young
by
Portugal.
The
Man
artist_output
=
chat_model(artist_query.to_messages())output
=
output_parser.parse(artist_output.content)
print
(output)print
(type(output))# ÕâÀïҪעÒâµÄÊÇ£¬ÒòΪÎÒÃÇʹÓÃµÄ turbo Ä£ÐÍ£¬Éú³ÉµÄ½á¹û²¢²»Ò»¶¨ÊÇÿ´Î¶¼Ò»ÖµÄ# Ìæ»»³Égpt4Ä£ÐÍ¿ÉÄÜÊǸüºÃµÄÑ¡Ôñ
{'artist':
'Portugal. The Man',
'song':
'So Young'}<class
'dict'>
ËÄ£¬½á¹ûÆÀ¹À(Evaluation)
ÓÉÓÚ×ÔÈ»ÓïÑԵIJ»¿ÉÔ¤²âÐԺͿɱäÐÔ£¬ÆÀ¹ÀLLMµÄÊä³öÊÇ·ñÕýÈ·ÓÐЩÀ§ÄÑ£¬langchain ÌṩÁËÒ»ÖÖ·½Ê½°ïÖúÎÒÃÇÈ¥½â¾öÕâÒ»ÄÑÌâ¡£
# Embeddings, store, and retrievalfrom
langchain.embeddings.openai
import
OpenAIEmbeddingsfrom
langchain.vectorstores
import
FAISSfrom
langchain.chains
import
RetrievalQA
# Model and doc loaderfrom
langchain
import
OpenAIfrom
langchain.document_loaders
import
TextLoader
# Evalfrom
langchain.evaluation.qa
import
QAEvalChain
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
# »¹ÊÇʹÓð®ÀöË¿ÂþÓÎÏɾ³×÷ΪÎı¾ÊäÈëloader
=
TextLoader('wonderland.txt')doc
=
loader.load()
print
(f"You have {len(doc)} document")print
(f"You have {len(doc[0].page_content)} characters in that document")
You
have
1
documentYou
have
164014
characters
in
that
document
text_splitter
=
RecursiveCharacterTextSplitter(chunk_size=3000,
chunk_overlap=400)docs
=
text_splitter.split_documents(doc)
# Get the total number of characters so we can see the average laternum_total_characters
=
sum([len(x.page_content)
for
x
in
docs])
print
(f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
Now
you
have
62
documents
that
have
an
average
of
2,846
characters
(smaller
pieces)
# Embeddings and docstoreembeddings
=
OpenAIEmbeddings(openai_api_key=openai_api_key)docsearch
=
FAISS.from_documents(docs,
embeddings)
chain
=
RetrievalQA.from_chain_type(llm=llm,
chain_type="stuff",
retriever=docsearch.as_retriever(),
input_key="question")# ×¢ÒâÕâÀïµÄ input_key ²ÎÊý£¬Õâ¸ö²ÎÊý¸æËßÁË chain ÎÒµÄÎÊÌâÔÚ×ÖµäÖеÄÄĸö key Àï# ÕâÑù chain ¾Í»á×Ô¶¯È¥ÕÒµ½ÎÊÌâ²¢½«Æä´«µÝ¸ø LLM
question_answers
=
[
{'question'
:
"Which animal give alice a instruction?",
'answer'
:
'rabbit'},
{'question'
:
"What is the author of the book",
'answer'
:
'Elon Mask'}]
predictions
=
chain.apply(question_answers)predictions# ʹÓÃLLMÄ£ÐͽøÐÐÔ¤²â£¬²¢½«´ð°¸ÓëÎÒÌṩµÄ´ð°¸½øÐбȽϣ¬ÕâÀïÐÅÈÎÎÒ×Ô¼ºÌṩµÄÈ˹¤´ð°¸ÊÇÕýÈ·µÄ
[{'question':
'Which animal give alice a instruction?',
'answer':
'rabbit',
'result':
' The Caterpillar gave Alice instructions.'},
{'question':
'What is the author of the book',
'answer':
'Elon Mask',
'result':
' The author of the book is Lewis Carroll.'}]
# Start your eval chaineval_chain
=
QAEvalChain.from_llm(llm)
graded_outputs
=
eval_chain.evaluate(question_answers,
predictions,
question_key="question",
prediction_key="result",
answer_key='answer')
graded_outputs
[{'text':
' INCORRECT'},
{'text':
' INCORRECT'}]
Î壬Êý¾Ý¿âÎÊ´ð(Querying Tabular
Data)
# ʹÓÃ×ÔÈ»ÓïÑÔ²éѯһ¸ö SQLite Êý¾Ý¿â£¬ÎÒÃǽ«Ê¹ÓþɽðɽÊ÷ľÊý¾Ý¼¯# Don't run following code if you don't run sqlite and follow dbfrom
langchain
import
OpenAI,
SQLDatabase,
SQLDatabaseChain
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
sqlite_db_path
=
'data/San_Francisco_Trees.db'db
=
SQLDatabase.from_uri(f"sqlite:///{sqlite_db_path}")
db_chain
=
SQLDatabaseChain(llm=llm,
database=db,
verbose=True)
db_chain.run("How many Species of trees are there in San Francisco?")
.
Find which table to use
.
Find which column to use
.
Construct the correct sql query
.
Execute that query
.
Get the result
.
Return a natural language reponse back
confirm LLM result via pandas
import
sqlite3import
pandas
as
pd
# Connect to the SQLite databaseconnection
=
sqlite3.connect(sqlite_db_path)
# Define your SQL queryquery
=
"SELECT count(distinct qSpecies) FROM SFTrees"
# Read the SQL query into a Pandas DataFramedf
=
pd.read_sql_query(query,
connection)
# Close the connectionconnection.close()
# Display the result in the first column first cellprint(df.iloc[0,0])
Áù£¬´úÂëÀí½â(Code
Understanding)
´úÂëÀí½âÓõ½µÄ¹¤¾ßºÍÎĵµÎÊ´ð²î²»¶à£¬²»¹ýÎÒÃǵÄÊäÈëÊÇÒ»¸öÏîÄ¿µÄ´úÂë¡£
# Helper to read local filesimport
os
# Vector Supportfrom
langchain.vectorstores
import
FAISSfrom
langchain.embeddings.openai
import
OpenAIEmbeddings
# Model and chainfrom
langchain.chat_models
import
ChatOpenAI
# Text splittersfrom
langchain.text_splitter
import
CharacterTextSplitterfrom
langchain.document_loaders
import
TextLoader
llm
=
ChatOpenAI(model='gpt-3.5-turbo',
openai_api_key=openai_api_key)
embeddings
=
OpenAIEmbeddings(disallowed_special=(),
openai_api_key=openai_api_key)
root_dir
=
'/content/drive/MyDrive/thefuzz-master'docs
=
[]
# Go through each folderfor
dirpath,
dirnames,
filenames
in
os.walk(root_dir):
# Go through each file
for
file
in
filenames:
try:
# Load up the file as a doc and split
loader
=
TextLoader(os.path.join(dirpath,
file),
encoding='utf-8')
docs.extend(loader.load_and_split())
except
Exception
as
e:
pass
print
(f"You have {len(docs)} documents\n")print
("------ Start Document ------")print
(docs[0].page_content[:300])
You
have
175
documents
------
Start
Document
------from
timeit
import
timeitimport
mathimport
csv
iterations
=
100000
reader
=
csv.DictReader(open('data/titledata.csv'),
delimiter='|')titles
=
[i['custom_title']
for
i
in
reader]title_blob
=
'\n'.join(titles)
cirque_strings
=
[
"cirque du soleil - zarkana - las vegas",
"cirque du sol
docsearch
=
FAISS.from_documents(docs,
embeddings)
# Get our retriever readyqa
=
RetrievalQA.from_chain_type(llm=llm,
chain_type="stuff",
retriever=docsearch.as_retriever())
query
=
"What function do I use if I want to find the most similar item in a list of items?"output
=
qa.run(query)
print
(output)
You
can
use
the
`process.extractOne()`
function
from
`thefuzz`
package
to
find
the
most
similar
item
in
a
list
of
items.
For
example:
```from
thefuzz
import
process
choices
=
["New York Yankees",
"Boston Red Sox",
"Chicago Cubs",
"Los Angeles Dodgers"]query
=
"new york mets vs atlanta braves"
best_match
=
process.extractOne(query,
choices)print(best_match)```
This
will
output:
```('New York Yankees',
50)```
Where
`('New York Yankees', 50)`
means
that
the
closest
match
found
was
"New York Yankees"
with
a
score
of
50
(out
of
100).
query
=
"Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"output
=
qa.run(query)print(output)
process.extractOne(query,
choices)
Æߣ¬API½»»¥(Interacting with APIs)
Èç¹ûÄãÐèÒªµÄÊý¾Ý»ò²Ù×÷ÔÚ API Ö®ºó£¬¾ÍÐèÒªLLMÄܹ»ºÍAPI½øÐн»»¥¡£
µ½Õâ¸ö»·½Ú£¬¾ÍÓë Agents ºÍ
Plugins ϢϢÏà¹ØÁË¡£
Demo¿ÉÄܼܺòµ¥£¬µ«Êǹ¦ÄÜ¿ÉÒԺܸ´ÔÓ¡£
from
langchain.chains
import
APIChainfrom
langchain.llms
import
OpenAI
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
api_docs
=
"""BASE URL: https://restcountries.com/API Documentation:The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below: - name: Name of country - Ex: italy, france The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below: - currency: 3 letter currency. Example: USD, COP Woo! This is my documentation"""
chain_new
=
APIChain.from_llm_and_api_docs(llm,
api_docs,
verbose=True)
chain_new.run('Can you tell me information about france?')
' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (€). It has a population of 67,391,582 and its borders are with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'
chain_new.run('Can you tell me about the currency COP?')
' The currency of Colombia is the Colombian peso (COP), symbolized by the "$" sign.'
°Ë£¬ÁÄÌì»úÆ÷ÈË(Chatbots)
ÁÄÌì»úÆ÷ÈËʹÓÃÁË֮ǰÌá¼°¹ýµÄºÜ¶à¹¤¾ß£¬ÇÒ×îÖØÒªµÄÊÇÔö¼ÓÁËÒ»¸öÖØÒªµÄ¹¤¾ß£º¼ÇÒäÁ¦¡£
ÓëÓû§½øÐÐʵʱ½»»¥£¬ÎªÓû§Ìṩ×ÔÈ»ÓïÑÔÎÊÌâµÄƽÒ×½üÈ赀 UI£¬
from
langchain.llms
import
OpenAIfrom
langchain
import
LLMChainfrom
langchain.prompts.prompt
import
PromptTemplate
# Chat specific componentsfrom
langchain.memory
import
ConversationBufferMemory
template
=
"""You are a chatbot that is unhelpful.Your goal is to not help the user but only make jokes.Take what the user is saying and make a joke out of it{chat_history}Human: {human_input}Chatbot:"""
prompt
=
PromptTemplate(
input_variables=["chat_history",
"human_input"],
template=template)memory
=
ConversationBufferMemory(memory_key="chat_history")
llm_chain
=
LLMChain(
llm=OpenAI(openai_api_key=openai_api_key),
prompt=prompt,
verbose=True,
memory=memory)
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")
' An pear is a fruit, but a vegetable-pear is a pun-ishable offense!'
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")# ÕâÀïµÚ¶þ¸öÎÊÌâµÄ´ð°¸ÊÇÀ´×ÔÓÚµÚÒ»¸ö´ð°¸±¾ÉíµÄ£¬Òò´ËÎÒÃÇʹÓõ½ÁË memory
" An pear - but don't let it get to your core!"
¾Å£¬ÖÇÄÜÌå(Agents)
AgentsÊÇ LLM ÖÐ×îÈÈÃÅµÄ Ö÷ÌâÖ®Ò»¡£
Agents¿ÉÒԲ鿴Êý¾Ý¡¢ÍƶÏÏÂÒ»²½Ó¦¸Ã²ÉȡʲôÐж¯£¬²¢Í¨¹ý¹¤¾ßΪÄúÖ´ÐиÃÐж¯, ÊÇÒ»¸ö¾ß±¸AIÖÇÄܵľö²ßÕß¡£
ÎÂÜ°Ìáʾ£ºÐ¡ÐÄʹÓà Auto GPT, »áѸËÙÏûºÄµôÄã´óÁ¿µÄtoken¡£
# Helpersimport
osimport
json
from
langchain.llms
import
OpenAI
# Agent importsfrom
langchain.agents
import
load_toolsfrom
langchain.agents
import
initialize_agent
# Tool importsfrom
langchain.agents
import
Toolfrom
langchain.utilities
import
GoogleSearchAPIWrapperfrom
langchain.utilities
import
TextRequestsWrapper
os.environ["GOOGLE_CSE_ID"]
=
"YOUR_GOOGLE_CSE_ID"os.environ["GOOGLE_API_KEY"]
=
"YOUR_GOOGLE_API_KEY"
llm
=
OpenAI(temperature=0,
openai_api_key=openai_api_key)
search
=
GoogleSearchAPIWrapper()
requests
=
TextRequestsWrapper()
toolkit
=
[
Tool(
name
=
"Search",
func=search.run,
description="useful for when you need to search google to answer questions about current events"
),
Tool(
name
=
"Requests",
func=requests.get,
description="Useful for when you to make a request to a URL"
),]
agent
=
initialize_agent(toolkit,
llm,
agent="zero-shot-react-description",
verbose=True,
return_intermediate_steps=True)
response
=
agent({"input":"What is the capital of canada?"})response['output']
'Ottawa is the capital of Canada.'
response
=
agent({"input":"Tell me what the comments are about on this webpage https://news.ycombinator.com/item?id=34425779"})response['output']
'The comments on the webpage are about the history of Y Combinator.'
ÒÔÉÏ¡£Íòˮǧɽ×ÜÊÇÇ飬µã¸öÔÚ¿´Ðв»ÐУ¿
¸ÐлÃ÷ѵͬѧ¹©¸å£¡
notebookÔ´Â룺
https://github.com/sawyerbutton/NLP-Funda-2023-Spring/blob/main/Related/langchain_usecases.ipynbgithub.com/sawyerbutton/NLP-Funda-2023-Spring/blo
³ö×Ô£ºhttps://zhuanlan.zhihu.com/p/654052645