Get the most out of your Centmin Mod LEMP stack
Become a Member

Email Verification And Email Cleaning Services Discussion

Discussion in 'Domains, DNS, Email & SSL Certificates' started by eva2000, May 8, 2024.

  1. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    A Xenforo forum discussion brought up the topic of using email verification or email validation services to clean up a 530,000 member email list which has a very high 50% email bounce rate! Cleaning such email lists for bad emails and bounced invalid email addresses is important so that forum email sending doesn't increase your email bounce rate and damage your email sending domain's reputation with high bounce and/or complaint rates :)

    I've consulted with paying clients for this and do this myself on this very forum's member email list :D But it's been partly done manually.

    So I thought I'd code a self-hosted script validate_emails.py that can do the email syntax, DNS and SMTP check verifications locally on own servers but also added API support for 5 paid commercial email verification providers:
    Links to services may be affiliate links ;) The validate_emails.py email validation script was written by me for my paid consulting clients usage. Info at GitHub - centminmod/validate-emails is public documentation for the script only.

    I added Xenforo support to my script too. Generates SQL queries for updating user status user_state in XenForo forum based email validation results. Allowing you to clean up your Xenforo user database's email addresses by disabling email sending to those specific bad email addresses. Just cleaned up this forum's email member list as well so if members have a bad email address, you'd be moved to bounce email user state and not receive forum mailings until you update to a valid email address :)

    I thought I'd share my experiences here as folks might find it useful and folks can chime in of their own experiences with email cleaning or email verification services.

    My personal experience with is with Wordpress and vbulletin/Xenforo forum communities for handling mass emails. So cleaning forum member email lists is an important task. Here's the cost comparison table and demo email verification API comparison results I recently did for these above 5 email verification providers.

    I also added to my script API Merge support via -apimerge argument to merge EmailListVerify + MillionVerifier API results together for more accurate email verification results. So querying 2 API services at once :cool:

    Example of Merging EmailListVerify + MillionVerifier API results for both into one JSON result output for per email verification checks for more accuracy :D

    Code (Text):
    time python validate_emails.py -f user@domain1.com -l emaillist.txt -tm all -api emaillistverify -apikey $elvkey -api millionverifier -apikey_mv $mvkey -apimerge
    [
        {
            "email": "user@mailsac.com",
            "elv_status": "disposable",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "yes",
            "mv_status": "disposable",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "yes",
            "mv_free_email_api": false,
            "mv_role_api": true
        },
        {
            "email": "xyz@centmil1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "user+to@domain1.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "user@tempr.email",
            "elv_status": "disposable",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "yes",
            "mv_status": "disposable",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "yes",
            "mv_free_email_api": false,
            "mv_role_api": true
        },
        {
            "email": "info@domain2.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": true
        },
        {
            "email": "xyz@domain1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "abc@domain1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": true
        },
        {
            "email": "123@domain1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "pop@domain1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "pip@domain1.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "no",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "no",
            "mv_disposable_email": "no",
            "mv_free_email_api": false,
            "mv_role_api": false
        },
        {
            "email": "user@gmail.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "no",
            "mv_free_email_api": true,
            "mv_role_api": false
        },
        {
            "email": "op999@gmail.com",
            "elv_status": "invalid",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "no",
            "mv_status": "invalid",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "no",
            "mv_free_email_api": true,
            "mv_role_api": false
        },
        {
            "email": "user@yahoo.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "no",
            "mv_free_email_api": true,
            "mv_role_api": false
        },
        {
            "email": "user1@outlook.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "no",
            "mv_free_email_api": true,
            "mv_role_api": false
        },
        {
            "email": "user2@hotmail.com",
            "elv_status": "ok",
            "elv_status_code": null,
            "elv_free_email": "yes",
            "elv_disposable_email": "no",
            "mv_status": "ok",
            "mv_status_code": null,
            "mv_free_email": "yes",
            "mv_disposable_email": "no",
            "mv_free_email_api": true,
            "mv_role_api": false
        }
    ]
    
    real    0m15.946s
    user    0m1.017s
    sys     0m0.037s


     
  2. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+

    Cloudflare HTTP Forward Proxy Worker Cache



    Updated my validate_emails.py script with Cloudflare HTTP Forward Proxy Cache With KV Storage support for EmailListVerify per email check API routines.

    Cloudflare HTTP forward proxy Worker cache configuration which can take the script's API request and forward it to EmailListVerify's API endpoint. The Cloudflare Worker script will then save the API result into Cloudflare KV storage on their edge servers and save with a date timestamp. This can potentially reduce your overall EmailListVerify per email verification costs if you need to run validate_emails.py a few times back to back bypassing having to need to call validate_emails.py API itself.

    Uncached usual run via the script usual result response would be unknown

    Code (Text):
    time python validate_emails.py -f user@domain1.com -e hnyfmw5@canadlan-drugs.com -tm all -api emaillistverify -apikey $elvkey
    [
       {
           "email": "hnyfmw5@canadlan-drugs.com",
           "status": "invalid",
           "status_code": null,
           "free_email": "unknown",
           "disposable_email": "no"
       }
    ]
    
    real    0m2.600s
    user    0m0.279s
    sys     0m0.020s
    


    Via Cloudflare HTTP forward proxy caching KV worker with -apicachettl 120 argument set returns email address status = unknown reducing time to return the result from 2.6s to 0.397s

    Code (Text):
    time python validate_emails.py -f user@domain1.com -e hnyfmw5@canadlan-drugs.com -tm all -api emaillistverify -apikey $elvkey -apicachettl 120
    [
       {
           "email": "hnyfmw5@canadlan-drugs.com",
           "status": "invalid",
           "status_code": null,
           "free_email": "unknown",
           "disposable_email": "no"
       }
    ]
    
    real    0m0.397s
    user    0m0.294s
    sys     0m0.025s
    


    Log inspection
    Code (Text):
    cat email_verification_log_2024-05-08_15-08-05.log | tail -3
    2024-05-08 15:08:06,816 - INFO - Checking cache for email: hnyfmw5@canadlan-drugs.com
    2024-05-08 15:08:07,047 - INFO - Cache check response status code: 200
    2024-05-08 15:08:07,047 - INFO - Cache result: unknown
    


    Cloudflare HTTP forward proxy caching KV worker console logged

    Code (Text):
    [DEBUG] Incoming request: https://cfcachedomain.com/?email=hnyfmw5@canadlan-drugs.com&cachettl=120
    [DEBUG] Email: hnyfmw5@canadlan-drugs.com
    [DEBUG] Cache Key: emaillistverify:hnyfmw5@canadlan-drugs.com
    [DEBUG] Cache TTL: 120
    [DEBUG] Cache Check: null
    [DEBUG] API URL: https://apps.emaillistverify.com/api/verifyEmail?secret=APIKEY&email=hnyfmw5@canadlan-drugs.com&timeout=15
    [DEBUG] Response from Cloudflare CDN cache: Hit
    [DEBUG] Skipping KV cache update as response is served from Cloudflare CDN cache
    [DEBUG] Returning final response with headers: {"cache-control":"max-age=120","content-type":"text/plain"}
    


    Query the KV storage cache entries count via -apicachecheck count

    Code (Text):
    time python validate_emails.py -f user@domain1.com -e hnyfmw5@canadlan-drugs.com -tm all -api emaillistverify -apikey $elvkey -apicachettl 120 -apicachecheck count
    API cache count: 1
    


    Query the KV storage cache entries listings via -apicachecheck list

    Code (Text):
    time python validate_emails.py -f user@domain1.com -e hnyfmw5@canadlan-drugs.com -tm all -api emaillistverify -apikey $elvkey -apicachettl 120 -apicachecheck list
    API cache list:
    {'email': 'hnyfmw5@canadlan-drugs.com', 'result': 'unknown', 'timestamp': 1715175271549, 'age': 16, 'ttl': 120}
    


    S3 Storage Support



    FYI, commercial email verification providers usually only store your file-based uploaded or bulk file API uploaded files for a defined duration, i.e. 30 days before they are deleted. And per email check API results are usually not stored at all. So if you need to store your per email check or bulk file API email verification results for longer, my validate_emails.py script now supports saving your results to S3 object storage providers - Cloudflare R2 or Amazon AWS S3 :D

    example

    Send validate_emails.py script results to Cloudflare R2 S3 object storage via -store r2 argument. Using EmailListVerify per email check API -api emaillistverify -apikey $elvkey + Cloudflare cached for 120 seconds -apicache emaillistverify -apicachettl 120

    Code (Text):
    time python validate_emails.py -f user@domain1.com -e hnyfmw@canadlan-drugs.com,hnyfmw2@canadlan-drugs.com,hnyfmw3@canadlan-drugs.com -api emaillistverify -apikey $elvkey -apicache emaillistverify -apicachettl 120 -tm all -store r2
    
    Output stored successfully in R2: emailapi-emaillistverify-cached/output_20240511051940.json
    [
        {
            "email": "hnyfmw@canadlan-drugs.com",
            "status": "unknown",
            "status_code": null,
            "free_email": "no",
            "disposable_email": "no"
        },
        {
            "email": "hnyfmw2@canadlan-drugs.com",
            "status": "unknown",
            "status_code": null,
            "free_email": "no",
            "disposable_email": "no"
        },
        {
            "email": "hnyfmw3@canadlan-drugs.com",
            "status": "unknown",
            "status_code": null,
            "free_email": "no",
            "disposable_email": "no"
        }
    ]
    
    real    0m1.663s
    user    0m0.391s
    sys     0m0.039s
     
  3. duderuud

    duderuud Active Member

    318
    97
    28
    Dec 5, 2020
    The Netherlands
    Ratings:
    +215
    Local Time:
    6:19 AM
    1.29 x
    11.4
    Sounds interesting but also complicated. I have 300k+ mailaccounts to check so the paid options are too expensive imo.
    So it is also possible to use this locally, can you point me in the right direction for that?

    FYI: Is use MXRoute for all my mails but also have an Amazon SES account. Never used my local SMTP/Mailserver.
     
  4. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    Yes can be expensive but in light of Google and Yahoo's new restrictive email/spam policies, you don't want to risk having to high an email bounce/complaint rate and damage your reputation.

    Yes you can run the script or do local syntax, dns and SMTP checks on Centmin Mod servers via the native Postfix MTA mail server as is provided you have proper setup main hostname and sending from email domain SPF/DKIM/DMARC setup as outlined at https://community.centminmod.com/th...ver-email-doesnt-end-up-in-spam-inboxes.6999/. I setup Amazon SES smtp as Postfix level SMTP relay so I can use from email domain that is Amazon SES verified already. Suppose you could use MXRoute SMTP for Postfix relay but MXRoute has 300 emails/hr rate limit so for 300K email testing, you'd be processing for a while LOL. But you risk damaging the sending domain's reputation this way. Hence, why I use 3rd party pain email verification services where possible and why I developed my above validate_emails.py script with commercial email verification provider API support :)
     
  5. duderuud

    duderuud Active Member

    318
    97
    28
    Dec 5, 2020
    The Netherlands
    Ratings:
    +215
    Local Time:
    6:19 AM
    1.29 x
    11.4
    Okay, bit the bullit and bought a package from Proofy.io

    They have a promotion with 35% off, code: R18HE27T35P1
    Now let's find out how to integrate this into Xenforo :)

    Edit: Where can I download the script itself?
     
    Last edited: May 9, 2024
  6. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    oh thought folks would read GitHub - centminmod/validate-emails - will update above post :)

     
  7. duderuud

    duderuud Active Member

    318
    97
    28
    Dec 5, 2020
    The Netherlands
    Ratings:
    +215
    Local Time:
    6:19 AM
    1.29 x
    11.4
    I read that already but I guess I misunderstood :)
     
  8. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
  9. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    Updated my PHP Wrapper with single and multiple email support via validate_emails.py per email verification routines and added validate_emails.py supported Cloudflare Cache (enabled for EmailListVerify and Zerobounce) and also support for S3 storage to store email verification results to either Amazon AWS S3 or Cloudflare R2 object storage buckets.

    Note: Timings reported include time for S3 storage - in this case saving to Cloudflare R2 bucket

    validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02.png validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02a.png validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02b.png validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02c.png validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02d.png validate_email_php_wrapper_multi-style2-cloudflare-cache-s3-02e.png
     
  10. elargento

    elargento Member

    353
    18
    18
    Jan 4, 2016
    Ratings:
    +45
    Local Time:
    1:19 AM
    10
    @eva2000 I'd like to know if you are willing to sell me the script. I have a bunch of emails (almost 120k) which I need to validate.

    Thanks :)
     
  11. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    Must be that time of year, you're not the only forum member that asked this week :). Right now it's only a tool I use myself on behalf of my paid clients and not something available for other users to use.
     
  12. elargento

    elargento Member

    353
    18
    18
    Jan 4, 2016
    Ratings:
    +45
    Local Time:
    1:19 AM
    10
    No problem, I was able to create a phyton script which worked flawlessly using just chatgpt
    I'm curious how do you check the catch-all domains like Gmail or Outlook to avoid them be marked as valid while they actually are not?
     
  13. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    :cool:

    Leave it to verification providers, some treat catch-all as not valid.
     
  14. wmtech

    wmtech Active Member

    187
    44
    28
    Jul 22, 2017
    Ratings:
    +139
    Local Time:
    6:19 AM
    I would like to share my personal experience with cleaning a large email list with over 100.000 addresses.

    The first choice was Proofy, because due to discount codes with 40%, it was the cheapest option of all the services recommended by @eva2000 .

    The result was problematic, it had a very large number of "UNKNOWN" status and all (over 6000) Yahoo addresses were not verified correctly. Almost all of them resulted in an "INVALID" status, but many of them in fact were legit and working addresses.

    So I had to run all Proofy verified addresses with "UNKNOWN" status through a second verification service. I decided to use Bouncify, because a first (free) test with 50 addresses shows a useable result. After running through Bouncify only 2% off all "UNKNOWN" addresses stayed with "UNKNOWN", the rest was classified "VALID" and "INVALID". So I used that result.

    I tested 50 Yahoo addresses with all the services recommened by @eva2000
    • EmailListVerify
    • MillionVerifier
    • MyEmailVerifier
    • CaptainVerify
    • Reoon
    and found out only Reoon was able to check Yahoo addresses reliable.

    So I ran all Yahoo addresses Proofy classifed as "INVALID" through Reoon and got a usable result with almost all of them classified "VALID" or "INVALID" and only very few got other statuses.

    So, if you need to verify a large address database, be prepared to use several verification services to get a result you can use to clean your email list.
     
  15. eva2000

    eva2000 Administrator Staff Member

    58,893
    12,490
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +19,122
    Local Time:
    2:19 PM
    Nginx 1.31.x
    MariaDB 10.x/11.4+/12.3+
    Thanks for sharing your experience. Proofy isn't that recommended in my later updated write up and benchmarks at https://github.com/centminmod/validate-emails IIRC

    Results at https://github.com/centminmod/valid...#email-verification-provider-comparison-costs and https://github.com/centminmod/valid...file#email-verification-results-table-compare

    I since added zerobounce provider as well.

    Yahoo is special case, IIRC some providers have additional options/toggles for Yahoo due to greymail policies Yahoo might use. So double check their respective provider documentation ;)

    FYI, Zerobounce has a field for greylisted as well. Zerobounce referral link https://centminmod.com/zerobounce and they currently have 20% discount coupon off PAYG credits = AUTUMN20 until this Friday (24hrs left if maths right for US timezone).