Giter VIP home page Giter VIP logo

Comments (2)

3coins avatar 3coins commented on August 12, 2024

@quissuiven
Is this still a problem? Can you share some sample code to reproduce?

from langchain-aws.

quissuiven avatar quissuiven commented on August 12, 2024

HI @3coins, yes it's still a problem. Here's the sample code, I'm running this in Sagemaker studio:

!pip install -q langchain kaleido pypdf pydantic langchain-community langchain-core
!pip install -q langchain_aws 

!pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57" \
    "requests" \
    "defusedxml"
    
import boto3
import json
import time
from io import BytesIO
from datetime import datetime
import dateutil.parser
import os
import pypdf
import re
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.llms import HuggingFacePipeline, Bedrock
from langchain.schema import BaseOutputParser, StrOutputParser
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from langchain.schema import OutputParserException, BaseOutputParser, StrOutputParser
from typing import List, Dict, Tuple
from langchain.schema.runnable import RunnablePassthrough, RunnableParallel, RunnableLambda
from langchain.pydantic_v1 import BaseModel, Field, validator
from langchain_aws import BedrockLLM, ChatBedrock

from langchain_core.prompts import MessagesPlaceholder
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)  

model = ChatBedrock(
    model_id = "anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={"temperature": 0}
)

def extract_pii_entities_with_reflection(resume_text):
    #EXTRACTION
    system_prompt_pii_masking = """
        You are a specialist focused on extracting personal identifying information from resumes. 
        Your job is to extract all personally identifying information from a resume. You respond only in valid JSON format.

        Here is your task:
        1. Read the candidate's resume text.
        2. Extract all personally identifying information matching the following template definition:
            person_name (list all people names in the text)
            physical_address (Use your advanced geopolitical knowledge to list all physical addresses in the text. This refers to only full addresses and excludes cities, states and countries.)
            phone_number (list all phone numbers in the text)
            email_address (list all email addresses in the text)
            url (list all URLs in the text)
            date_of_birth (list all dates of birth in the text)
            personal_identification_id (list all personal identification id in the text)

        Only extract information from the text, do not make up any information.
        Put the output in <response></response> XML tags.
    """
    human_prompt_pii_masking = "Here is the resume text: {TEXT}"

    def clean_response(response_message):
        response_str = response_message.content
        final_str = response_str.replace('<response>','')
        final_str = final_str.replace('</response>','')
        return final_str

    extractor_messages = ChatPromptTemplate.from_messages([("system", system_prompt_pii_masking),
                                                    MessagesPlaceholder(variable_name="messages")])

    runnable_extraction = extractor_messages | model | RunnableLambda(clean_response)
    query = human_prompt_pii_masking.format(TEXT=resume_text)
    request = HumanMessage(content = query)
    result_dict_extraction = runnable_extraction.invoke({"messages":[request]})

    #REFLECTION
    reflection_prompt = """
    You are tasked with evaluating personally identifying information extracted from a text. Here are your responsibilities:
    - Check all relevant personally identifying information have been extracted
    - All extracted information are present in the original text
    
    Your Feedback Protocol:
    - If suggesting modifications, include the specific segment and your recommendations.
    - If no modifications are necessary, respond with "Output looks correct. Please return the original output in the same format."
    """
    reflector_messages = ChatPromptTemplate.from_messages(
        [("system",reflection_prompt),
        MessagesPlaceholder(variable_name="messages")]
    )

    runnable_reflection = reflector_messages | model
    human_prompt_reflection = human_prompt_pii_masking.format(TEXT=resume_text)
    result_reflection = runnable_reflection.invoke({"messages": [HumanMessage(content = human_prompt_reflection), AIMessage(content = str(eval(result_dict_extraction)))]})
    
    #REFINED EXTRACTION
    message_1 = HumanMessage(content = human_prompt_reflection)
    message_2 = AIMessage(content = str(eval(result_dict_extraction)))
    message_3 = HumanMessage(content = result_reflection.content)
    runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})

    return runnable_extraction.invoke({"messages":[message_1, message_2, message_3]})

results_list_with_reflection = []
for index, resume_text in enumerate(resume_extracted_list):         
    print(f"Performing extraction for Resume {index+1}")
    results_dict_reflection = extract_pii_entities_with_reflection(resume_text)
    print(results_dict_reflection)
    print("\n")

from langchain-aws.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.