Spaces:
Runtime error
Runtime error
felix
commited on
Commit
·
c96e9d6
1
Parent(s):
db77f64
Start dataset creation.
Browse files- data.py +2 -0
- data_set_training.csv +119 -0
- system_promts.txt +9 -0
- train.py +4 -0
data.py
CHANGED
|
@@ -2,6 +2,8 @@ from transformers import AlbertTokenizer, AlbertModel
|
|
| 2 |
from sklearn.metrics.pairwise import cosine_similarity
|
| 3 |
import torch
|
| 4 |
|
|
|
|
|
|
|
| 5 |
# base
|
| 6 |
# large
|
| 7 |
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
|
|
|
|
| 2 |
from sklearn.metrics.pairwise import cosine_similarity
|
| 3 |
import torch
|
| 4 |
|
| 5 |
+
#This is a quick evaluation to see if B
|
| 6 |
+
|
| 7 |
# base
|
| 8 |
# large
|
| 9 |
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
|
data_set_training.csv
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Ste 2511-W, Holmdel NJ 07734|1
|
| 2 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Road, Ste 2511 W, Holmdel NJ 07734|1
|
| 3 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Suite 2511-W, Holmdel NJ 07734|1
|
| 4 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Road, Suite #2511-W, Holmdel NJ 07734|1
|
| 5 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Suite #2511W, Holmdel NJ 07734|1
|
| 6 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Road, Suite 2511W, Holmdel NJ 07734|1
|
| 7 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Suite #2511-W, Holmdel NJ 07734|1
|
| 8 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Suite 2511 W, Holmdel NJ 07734|1
|
| 9 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Road, Suite #2511 W, Holmdel NJ 07734|1
|
| 10 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|101 Crawfords Corner Rd, Suite #2511W, Holmdel NJ 07734|1
|
| 11 |
+
101 Crawfords Corner Road, Suite 2511-W, Holmdel NJ 07734|200 Crawfords Corner Rd, Holmdel NJ 07734|0
|
| 12 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Boulevard Ext, Warren NJ 07059|1
|
| 13 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Extension, Warren NJ 07059|1
|
| 14 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext., Warren NJ 07059|1
|
| 15 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Boulevard Extension, Warren NJ 07059|1
|
| 16 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Extention, Warren NJ 07059|1
|
| 17 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd. Ext, Warren NJ 07059|1
|
| 18 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Boulevard Ext., Warren NJ 07059|1
|
| 19 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd. Extension, Warren NJ 07059|1
|
| 20 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd. Extention, Warren NJ 07059|1
|
| 21 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mtn Blvd Ext, Warren NJ 07059|1
|
| 22 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Extension, Suite A|0
|
| 23 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Boulevard Ext, Apt 2B|0
|
| 24 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext, Building C|0
|
| 25 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext., Suite 3|0
|
| 26 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext. W|0
|
| 27 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext, Unit 5|0
|
| 28 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Extension, Room 12|0
|
| 29 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext, Block D|0
|
| 30 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Extension, Lot 7|0
|
| 31 |
+
65 Mountain Blvd Ext, Warren NJ 07059|65 Mountain Blvd Ext, Floor 2|0
|
| 32 |
+
65 Mountain Blvd Ext, Warren NJ 07059|12 Mountain Blvd Ext, Warren NJ 07059|0
|
| 33 |
+
65 Mountain Blvd Ext, Warren NJ 07059|23 Mountain Blvd Ext, Warren NJ 07059|0
|
| 34 |
+
65 Mountain Blvd Ext, Warren NJ 07059|54 Mountain Blvd Ext, Warren NJ 07059|0
|
| 35 |
+
65 Mountain Blvd Ext, Warren NJ 07059|102 Mountain Blvd Ext, Warren NJ 07059|0
|
| 36 |
+
65 Mountain Blvd Ext, Warren NJ 07059|5 Mountain Blvd Ext, Warren NJ 07059|0
|
| 37 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|802 New Holland Avenue Suite 200, Lancaster, PA 17602|1
|
| 38 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|Suite 200, 802 New Holland Avenue, Lancaster, PA 17602|1
|
| 39 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|New Holland Avenue Suite 200, 802, Lancaster, PA 17602|1
|
| 40 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|Suite 200, 802 New Holland Ave, Lancaster, PA 17602|1
|
| 41 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|802 New Holland Ave #200, Lancaster, PA 17602|1
|
| 42 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|802 New Holland Avenue, 2nd Floor, Lancaster, PA 17602|1
|
| 43 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|200 Suite, 802 New Holland Ave, Lancaster, PA 17602|1
|
| 44 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|802 New Holland Ave, Suite #200, Lancaster, PA 17602|1
|
| 45 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|Suite 200, 802 New Holland Avenue, Lancaster, PA 17602|1
|
| 46 |
+
802 New Holland Ave Suite 200, Lancaster, PA 17602|802 New Holland Ave, Ste 200, Lancaster, PA 17602|1
|
| 47 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 South Bridge Street Suite 300, Brady, TX 76825|1
|
| 48 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|Suite 300, 2205 S Bridge St, Brady, TX 76825|1
|
| 49 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 S Bridge St #300, Brady, TX 76825|1
|
| 50 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 S Bridge St, Suite 300, Brady, TX 76825|1
|
| 51 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|Suite 300, 2205 South Bridge St, Brady, TX 76825|1
|
| 52 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 South Bridge St, Ste 300, Brady, TX 76825|1
|
| 53 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|Suite 300, 2205 S Bridge Street, Brady, TX 76825|1
|
| 54 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 S Bridge Street, Suite #300, Brady, TX 76825|1
|
| 55 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|Ste 300, 2205 S Bridge St, Brady, TX 76825|1
|
| 56 |
+
2205 S Bridge St Ste 300, Brady, TX 76825|2205 South Bridge Street, 3rd Floor, Brady, TX 76825|1
|
| 57 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 West Middle Fork Street, Boise, ID 83709|1
|
| 58 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 W Middle Fork St., Boise, ID 83709|1
|
| 59 |
+
7671 W Middle Fork St, Boise, ID 83709|Middle Fork St W, 7671, Boise, ID 83709|1
|
| 60 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 Middle Fork St West, Boise, ID 83709|1
|
| 61 |
+
7671 W Middle Fork St, Boise, ID 83709|W Middle Fork St, 7671, Boise, ID 83709|1
|
| 62 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 Middle Fork Street, Boise, ID 83709|1
|
| 63 |
+
7671 W Middle Fork St, Boise, ID 83709|Middle Fork Street West, 7671, Boise, ID 83709|1
|
| 64 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 W Middle Fork Street, Boise, ID 83709|1
|
| 65 |
+
7671 W Middle Fork St, Boise, ID 83709|Middle Fork St, 7671 West, Boise, ID 83709|1
|
| 66 |
+
7671 W Middle Fork St, Boise, ID 83709|7671 W Middle Fork St, Unit A, Boise, ID 83709|0
|
| 67 |
+
3059 W 26th St, Chicago, IL 60623|3059 West 26th Street, Chicago, IL 60623|1
|
| 68 |
+
3059 W 26th St, Chicago, IL 60623|3059 W 26th St., Chicago, IL 60623|1
|
| 69 |
+
3059 W 26th St, Chicago, IL 60623|West 26th St, 3059, Chicago, IL 60623|1
|
| 70 |
+
3059 W 26th St, Chicago, IL 60623|3059 26th St West, Chicago, IL 60623|1
|
| 71 |
+
3059 W 26th St, Chicago, IL 60623|W 26th St, 3059, Chicago, IL 60623|1
|
| 72 |
+
3059 W 26th St, Chicago, IL 60623|3059 26th Street, Chicago, IL 60623|1
|
| 73 |
+
3059 W 26th St, Chicago, IL 60623|West 26th Street, 3059, Chicago, IL 60623|1
|
| 74 |
+
3059 W 26th St, Chicago, IL 60623|3059 W 26th Street, Chicago, IL 60623|1
|
| 75 |
+
3059 W 26th St, Chicago, IL 60623|26th St, 3059 West, Chicago, IL 60623|1
|
| 76 |
+
3059 W 26th St, Chicago, IL 60623|3059 W 26th St, Apt B, Chicago, IL 60623|0
|
| 77 |
+
3059 W 26th St, Chicago, IL 60623|3059 W 26th St, SAH Community Care Clinic, Chicago, IL 60623|1
|
| 78 |
+
125 S Gadsden St #300, Tallahassee, FL 32301|125 S Gadsden St #300A, Tallahassee, FL 32301|0
|
| 79 |
+
125 S Gadsden St #300, Tallahassee, FL 32301|125 S Gadsden St, Suite 3A, Tallahassee, FL 32301|0
|
| 80 |
+
5432 Oak Street, Suite 101, Greenfield, 45678|6789 Pine Street, Suite 100, Mountain View, 90123|0
|
| 81 |
+
987 Main Road, Suite B, Roseville, 78901|4567 Birch Court, Suite D, River City, 67890|0
|
| 82 |
+
13579 Maple Boulevard, Suite 3, Springfield, 12345|5678 Park Street, Suite 150, Meadow Glen, 34567|0
|
| 83 |
+
2468 Cedar Lane, Suite 200, Lexington, 56789|9012 Maple Avenue, Suite 3, Springfield, 12345|0
|
| 84 |
+
777 Pine Street, Suite 100, Portland, 90123|2345 Birch Avenue, Suite 50, Manchester, 23456|0
|
| 85 |
+
1234 Birch Court, Suite 50, Manchester, 23456|1234 Cedar Lane, Suite 50, Oceanview, 23456|0
|
| 86 |
+
888 Chestnut Avenue, Suite D, Charleston, 67890|7890 Chestnut Avenue, Suite 300, Lakeside, 89012|0
|
| 87 |
+
999 Willow Drive, Suite 300, Greenville, 89012|2345 Willow Lane, Suite 75, Forest Hills, 23456|0
|
| 88 |
+
111 Park Avenue, Suite 75, Richmond, 23456|9012 Elm Road, Suite 3, Big City, 12345|0
|
| 89 |
+
2222 Grove Street, Suite 150, Savannah, 34567|1234 Elm Street, Suite 101, Greenfield, 45678|0
|
| 90 |
+
2178 Oak Road, Suite 101, Greenfield, 45678|3456 Maple Boulevard, Suite 200, Sunnyville, 56789|0
|
| 91 |
+
43 Main Street, Suite B, Roseville, 78901|5678 Main Avenue, Suite B, Smallville, 78901|0
|
| 92 |
+
900 Maple Avenue, Suite 3, Springfield, 12345|2345 Oak Street, Suite 101, Anytown, 45678|0
|
| 93 |
+
675 Cedar Lane, Suite 200, Lexington, 56789|5678 Oak Road, Suite B, Roseville, 78901|0
|
| 94 |
+
80 Pine Street, Suite 100, Portland, 90123|2345 Park Lane, Suite 75, Richmond, 23456|0
|
| 95 |
+
2345 Birch Avenue, Suite 50, Manchester, 23456|4567 Chestnut Street, Suite D, Charleston, 67890|0
|
| 96 |
+
867 Chestnut Street, Suite D, Charleston, 67890|5678 Grove Boulevard, Suite 150, Savannah, 34567|0
|
| 97 |
+
29 Willow Road, Suite 300, Greenville, 89012|3456 Cedar Lane, Suite 200, Lexington, 56789|0
|
| 98 |
+
345 Park Lane, Suite 75, Richmond, 23456|7890 Willow Road, Suite 300, Greenville, 89012|0
|
| 99 |
+
5678 Grove Boulevard, Suite 150, Savannah, 34567|6789 Pine Street, Suite 100, Portland, 90123|0
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
|
system_promts.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Create similar addresses:
|
| 2 |
+
----------------------------
|
| 3 |
+
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 variations of the provided address as a regular person may enter it into some system. All outputs should be a list of 10 sample.
|
| 4 |
+
|
| 5 |
+
Create similar but different addresses:
|
| 6 |
+
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 variations of the provided address that are actually different addresses. Address formats should be as if a regular person may enter it into some system.
|
| 7 |
+
|
| 8 |
+
Generate completely different addresses.
|
| 9 |
+
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 random addresses that are inspired by the structure of the address user provided. Address formats should be as if a regular person may enter it into some system.
|
train.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Will be based on
|
| 2 |
+
# ConstructiveLoss function.
|
| 3 |
+
#
|
| 4 |
+
# https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/quora_duplicate_questions/training_OnlineContrastiveLoss.py
|